




elementary 

STATISTICAL METHODS 


By 

E. C. RHODES, B.A., D.Sc. 

Reader in Statistics in the University of London 



LONDON 

GEORGE ROUTLEDGE & SONS, LIMITED 

BROADWAY HOUSE: 68-74 CARTER LANE, E.C.4 



6209 



Fir^ PubiisKed t^<K*embcT 1933 
S€cond impression Se^mber 1935 
T\ird impression 1937 

Fourth impression January 1941 
F^tlk impression January 1944 
Sixth impression May 19^5 

Seienth impression July 1946 
Eighth impression January 1948 



P iU N T I ; D IN G H L \ T B H IT \ K 
LUND HIT M P H B I L S 





^ BY 


LONDON 


B H A D K O R D 



CONTENTS 

CHAP. 

1. Statistics .... . , . i 

2 . Statistical Inquiries .... 7 

3. Assembling Statistical Data ... 22 

4. Secondary or Derived Statistics . . 42 

5. COMP.^RISON OF A\'ERAGES . . . . 6o 

6. The Calculation of Averages ... 77 

7. Graphical Methods 97 

S. The Median and Measures of Dispersion 117 

9. Weighted Sums and Weighted Averages . 136 

10. Index Numbers 151 

11. Graphs of Tibie Series .... 173 

12. Analysis of Time Series .... 20S 

Index ....... 240 




STATISTICS 


The functions of a Statistician may properly be con- 
sidered as divisible into three parts. In the first place 
he is concerned with the assembling of statistical data, 
in the second place with their analysis, in the third place 
with the interpretation of the results of such an analysis. 
Sometimes, owing to the division of labour in a highly 
organized society, a particular statistician may only be 
concerned with one or two of these functions. Thus, 
he may be solely engaged in the analysis of statistics, 
without bothering himself about the methods of collection 
of the data or with the possible interpretations which 
may be put to his results. This kind of subdivision of 
the statistician's duties may not always be favourable 
to the best elucidation of a particular problem, for the 
persons assembling the data may know little or nothing 
of the superstructure of deduction which is subsequently 
to be built on the foundation of facts— if they knew more 
they might modify somewhat their methods, in order that 
this superstructure would have less likelihood of being 
blown down by a storm of criticism. Similarly, those 



2 


ELEMENTARY STATISTICAL METHODS 

who attempt only the interpretation of the results of a 
statistical analysis, without knowing much of the sources 
of the data and the methods of analysis, would perhaps 
modify their conclusions if they had followed through 
the whole process from the raw material stage to that 
of finished product. 

At whatever stage a statistician is concerned with the 
data, he ought to know something definite about the 
processes to which these data are submitted in their 
earlier and later phases of treatment. Otherwise we get 
the phenomenon of persons earnestly engaged in analysing 
statistical data without knowing exactly why they are 
doing so, or persons collecting data which are valueless 
in their present form, but which might have been of great 
use if the method of obtaining them had been slightly 
modified, or others putting an interpretation on the 
results of an analysis which the original data do not 
justify. 

We must therefore concern ourselves v/ith these various 
aspects of the work of a statistician, and in the first place 
we must ask ourselves ” What are statistical data ? " 
These in their origiilal form are facts relating to 
a group of units which are susceptible of counting or 
numerical appreciation in some form or other. Thus, we 
may be concerned with a number of factories rnanu* 
facturing a given product. The numerical facts relating 
to machines in operation, p>ersons employed, hours worked, 
wagpig paid, and so on are statistical data. We may be 
dealing with farms, their acreage, the number of live- 
stock, and so on. We may be dealing wdth heights of 
a group of persons or their incomes or their state of health. 
Statistical data are available in large quantity in records 



3 


STATISTICS 

of various kinds, books kept by firms, publications relating 

to trade, population, and so on. 

Such pieces of information, in order to be appreciated 
at their true worth, should be precise. Considerations 
of space and time are elementary as the foundations of 
such precision. We must know definitely to what 
particular region or place the statistics refer, and to what 
instant or period of time they have relation. This regard 
to space and time is but one aspect of a general con- 
sideration of exactness of definition of particular terms 
which are used in the description of the statistics. For 
a proper appreciation of the exact meaning of statistical 
data, we must be acquainted with the precise significance 
of such words as population ”, ** manufacturing concern , 
“trade", “port", etc., in the context. All such 
definitions necessarily involve reference to an area and 
to time. 

We realize, immediately we think about the matter, 
that statistical data are only of value for comparative 
purposes, and, realizing this, we see the necessity for 
precision, because if we wish to compare two things which 
are described in the same way on different occasions, we 
must be sure that the same words used on these occasions 
have really the same meaning. 

We may, on occasion, make a vague statement such as, 
“The population of England and Wales nowadays is 
very large ", and such a statement in its right context 
probably conve}^ quite adequately our intention. Or 
we may say instead “ The population of England and 
Wales nowadays is about 40 millions " ; here we replace 
the adjectival description “ very large " by “ about 
40 m il li ons " and probably convey the same kind of 



ELEMENTARY STATISTICAL METHODS 


impression to others ; we might equally well have used 
" 45 millions ” or ”50 millions ” to convey the same 
impression of the large size of the population, and no one 
would worry to examine the exactitude of the figure. 
No precise significance attaches to such statements as 
“ There are 3,000,000 people in Great Britain suffering 
from nervous disorders — Dr. Elizabeth Sloan Chesser, 
Evening Standard, 4th April, 1933. The point is that in 
such bare statements of fact the figures are really being 


used in an adjectival sense. But when we come to other 
statements involving comparisons, we have to make 
certain that our figures are correct. We may be com- 
paring the value of Imports into the United Kingdom 
in 1913 with those in 1931. Now we have to be sure, (i) 
that we are dealing with the same kind of goods, i.e. 
that the definition of “ Imports ” has not changed between 


the two dates, (2) that we are dealing with the same 
geographical area, i.e. that the term “ United Kingdom " 
means the same in 1931 as it did in 1913 (as a matter of 
fact this is not the case owing to the sep^tion of the 
South Ireland trade figures from the United Kingdom 
figures on the formation of the Irish Free State in 1923), 
{3) that the figures themselves are accurate. Similarly 
we may desire to review the position as regards Unemploy- 
ment in 1931 in Great Britain, the United States, and 
Germany by comparing the Unemployment Statistics of 
these three countries. Before we can come to correct 
conclusions from such an examination of these fipires, we 
must consider whether the same kinds of defimtions are 
used in these countries of the term “ Unemployment 
Statistics”; (as a matter of fact grave 
encountered in attempts at international statistical 



STATISTICS :> 

comparisons owing to different interpretations being given, 
for administrative or statistical purposes, to simple terms 
which, in the opinion of the man in the street, may on y 

have one meaning). ^ 

It is essential to realize, from the start, that in certain 

cases the statistical aspect may be subsidiary to other 
considerations. For instance, it may be desirable to 
compare the state of civilization of a country with that 
of another, or of one country at different epochs. Such 
a comparison may be attempted by a description of 
summaries of observers’ experiences, or recourse may be 
had to such figures as are available, which appear to have 
relation to the particular problem, such figures as the 
numbers of persons obtaining School Certificates each 
year, or figures giving the numbers of places of worship 
or places of entertainment, or figures giving numbers of 
persons convicted of crime, and so on. The " state of 
civilization ” is not susceptible of statistical treatment. 
We may be pleased to utilize these sort of figures as of 
secondary importance, but major consideration would be 
given to general descriptions of the way people live, the 
kind of work they do, how they spend their leisure time, 
what they eat, and so on. 

Similarly in inquiries relating to the social conditions 
of the population, housing statistics, giving, for instance, 
the proportions of houses where overcrowding is deemed 
to exist (a statistical measure of persons per room having 
been achieved and passed), poverty statistics, giving 
proportions of population relieved by the Public Authority, 
and such like, have to be related to other facts, such as 
the type of houses, type of room in the house, location 
of the home in relation to the workplace, location of the 



6 ELEMENTARY STATISTICAL METHODS 

family near relatives, and so on, which may not easily be 
susceptible of statistical appreciation. Therefore, when 
we are considering a particular problem, the statistics 
must be given their proper weight, and no more than this, 
in relation to the other factors which are involved and 
which cannot be dealt with statistically. 



Chapter 2 

STATISTICAL INQUIRIES 

The phrase “ Statistical Inquiries ” is used in the present 
instance to cover a large field of action. Statistical data 
emerge in a variety of ways. They may be produced as 
a by-product of certain administrative operations. For 
instance, Imports into the United Kingdom, which are 
subject to duty, are carefully checked on entering by the 
Customs Officials, because the law requires that levies 
should be made on these classes of goods entering into 
consumption in the country. A necessary feature of this 
kind of action is the keeping of records of such entries. 
These records serve as the raw material of Statistical 
Tables, which are later available to the public, giving 
information on the subject of this class of Imp)orts. 
Similarly, there are laws relating to Unemployment 
Insurance which have to be administered. In the process 
of administration records are kept, and these records 
again represent raw material of statistics. In such cases 
as these the collection of statistical data is not the primary 
purpose ; it is doubtful if the necessary action would 
be taken if the collection of statistics were all that was 
intended. On the other hand such statistical data 
are immensely valuable, because they give us definite 
information on certain matters of importance in the 
National Economy. 

But statistical data are obtained directly as the result 
of an effort made to get information about certain affairs 



8 ELEMENTARY STATISTICAL METHODS 

of moment. For instance the Census of Population 
results in statistics which it is the purpose of the Census 
to obtain. In like manner the Census of Production 
gives statistical data relating to the Output of Industry, 
in a particular period of time, and to other cognate matters, 
and this information is obtained during the course of an 
inqui^3^ the sole purpose of which is the collection of 
these data. 

For present purposes it is useful to consider that the 
words “ Statistical Inquiries " refer to both these types 
of inquiry', the first where the collection of statistics is 
subsidiary to or a part only of the main action, the second 
where the collection of statistics is the only end in view. 
Appreciation of these two kinds of Statistical Inquiry is 
necessary because one of the fundamental factors to be 
considered in handling statistics is their accuracy and their 
meaning. Now, the meaning of statistical data depends 
upon the precise definitions given to certain terms to 
which the figures relate, which may often be technical 
in character, and these definitions may be primarily made 
to assist certain administrative action, or they may 
be made to suit a particular statistical inquiry which is 
undertaken wth the idea of obtaining information relating 
to certain matters of interest. For instance, the word 
" Imports ” connotes a certain meaning in the mind of 
the ordinary’ person, he thinks vaguely of aU goods coming 
into the country from abroad, but “ Imports ” for 
purposes of understanding the Statistical Tables issued by 
the Board of Trade means something quite precise and 
unambiguous.^ Similarly the words “ Unemployed 
Person ” convey a vague impression to the man in the 
street, but for the purpose of administering the Insurance 

^ See Appendix to Chapter 2. 



9 


STATISTICAL INQUIRIES 

Acts the Ministry of Labour have devised a workable 
definition which assists them in their operations, and 
which resolves the vagueness of the impression which the 
ordinary person has when he thinks of “ Unemployed 
Persons Similarly in those investigations conducted 
by Professor Bowley and others into Working-Class 
Conditions of living in certain towns in 1913-14, 1924-5, 
the results of which were published in “Livelihood 
and Poverty" and “Has Poverty Diminished?", and 
in the New Survey of London Life and Labour, 1928, 
an appropriate definition was invented for the term 
“ Working-Class The difference between the two 
t5^es of inquiry may be considered as one which influences 
the definitions given to terms used ; where the statistical 
data emerge as subsidiary to the main action the definitions 
may be made primarily to suit administration, and where 
the statistics emerge as the sole function of the inquiry 
the definitions will be made primarily to suit the conditions 
of the problem on which the statistical data will throw 
light. This emphasis on the meaning of statistical data, 
which involves knowledge of definitions used, is necessary 
on account of the fact that statistics only serve a useful 
purpose when placed in relation to one another for com- 
parisons to be made. So we need to know, if we are 
dealing with spatial comparisons, that the definitions 
used are the same in two countries, for instance ; or, if 
dealing with comparisons involving considerations of 
time, that the definitions used are the same at one time 
as at another. Now, even if two countries have the same 
kind of laws, their administration of them may be different, 
consequently the meaning of statistical data which emerge 
as a result may be different. Again, a country may modify 
^ See Appendix to Chapter 2. 


10 


ELEMENTARY STATISTICAL METHODS 

a particular series of laws from time to time, thus the 
adnunistration may alter and the meaning of statistical 
data may be changed, although the same terms be used 
to describe these data. On the other hand, if an inquiry 
has been instituted with the object of ascertaining certain 
facts, and as a consequence particular definitions are 
invented, similar inquiries in other countries or at other 
times may be instituted with the first as a model and the 
same definitions used again. 

With regard to the subject of the accuracy of the 
statistical data, it is obvious that sufi&cient precision is 
required in the results of an investigation, if the information 
obtained is to have any weight in subsequent argument 
or discussion. The accuracy of a final result depends on 
the accuracy of each part which contributes to a total. 
It is reasonable to suppose that less care will be taken in 
ensuring accuracy in statements and figures, if the primary 
use of such is not statistical but (say) administrative, 
where perhaps rough approximations to the truth are 
as useful as closer approximations. It is not meant that 
wilful misrepresentations are made deliberately, but 
merely that, in the stress and turmoil of doing a job, 
a person may not have the time or inclination to check 
every statement and figure, which are accepted at their 
face value, so long as they do not indicate any great 
apparent divergence from what is likely to be correct. 
For instance, when goods are entered into the United 
Kingdom as Free Imports, the Importers render state- 
ments to the Customs House which are checked, but 
for the most part they are assumed correct without 
further inquiry. In the case of Dutiable Goods Imported, 
more scrutiny is of course required. If the sole purpose 


II 


STATISTICAL INQUIRIES 

of the Custom House Officials were the collection of 
statistics a much larger staff would be required in order 
that all statements rendered coxild be properly checked 
and absolute accuracy insisted on. Naturally, if the 
purpose of an inquiry is merely the search for information 
it is more likely that the results wiU be reasonably accurate. 
In general, therefore, we may suppose that the informa- 
tion obtained as a result of the first kind of statistical 
inquiry is not likely to be as accurate as that obtained 
as a result of inquiry of the second kind, that is, an 
inquiry ad hoc. 

The subject of accuracy brings to mind another dis- 
tinction between statistical inquiries. This distinction 
depends on the source from which the information is 
obtained. In some inquiries a large number of persons 
play an influential part, in others a comparatively small 
number are actively involved, and those are selected 
in some way. This distinction is also related to the scope 
of the inquiry. In the Census of Population, for instance, 
each householder is responsible for the information relating 
to the persons in his household and for details as to the 
domicile. The number of householders is very large 
indeed, and therefore the sources from which the informa- 
tion is obtained in the Census of Population are many 
and varied. There are such wide variations in the educa- 
tional standards and intelligence of the householders in 
this country, that great care has to be taken in the wording 
of the questions put on the official form in order to avoid 
the possibility of ambiguity, and even then, if the questions 
are understood, some people may find themselves in 
difficulties over the answering of them. Moreover, this 
wide range of variation in the sources from which the 


B* 


12 


ELEMENTARY STATISTICAL METHODS 

information is obtained is one reason why the Census 
Office only attempts to get knowledge of a comparatively 
small number of matters of interest. The scope of the 
inquiry is necessarily limited on this account. Even with 
the small number of questions asked in the Census, doubt 
must arise as to whether the questions have been answered 
truthfully or not, since no check is possible, except in 
the most obvious cases. There may be no particular 
incentive to supply wrong information, it may be the 
fact that sheer lack of accurate knowledge forces a person 
un^rittingly to answer a question wrongly. We conclude 
that the accuracy of the final figures, depending on the 
accuracy of the constituent parts, is dependent on the 
number of sources from which the information is obtained. 

On the other hand these sources may be comparatively 
few in relation to the size of the inquiry. These few may 
be skilled investigators whose duty it is to cover the 
ground collecting the information. These persons may 
themselves examine others from whom the information 
is received, but by judicious questioning and cross 
questioning, or by reference to other sources of information, 
they may easily assure themselves of the truth of 
the information collected. Moreover, if adjectival 
descriptions, rather than accurate measurements, are 
given in answer to certain questions, it is more likely 
that these investigators would maintain a reasonable 
standard of value attaching to certain descriptive terms, 
than that the same meaning would be attached to these 
terms by a host of individuals. For instance, School 
Medical Inspections are made regularly, where school 
children are examined by Medical Officers in the service 
of the Educational Authority, and at these examinations 


13 


STATISTICAL INQUIRIES 

reports are made on the children's height, weight, state 
of nurture, condition of teeth, throat, hair, and so on. 
To a certain extent such words as Normal, Good, Poor 
are used in this connection, and it is obviously better 
that such descriptions should be given by the few doctors 
rather than by the many children or parents. Otherwise 
no one would be sure that the information obtained was 
of any use. Further, it is likely that the scope of an 
inquiry can be extended by using investigators who 
survey the field of inquiry, because they are often able to 
elicit information on particular topics, by subtly varying 
the kind of question put, in cases where the usual form 
of question is not properly understood, and by impressing 
the need for such an inquiry on the mind of the person 
from whom the information is desired. Such information 
would perhaps not be obtained from many persons if they 
were solicited by means of a questionnaire, schedule, or 
official form, either because the persons do not understand 
what is being asked, or how to answer, or because they 
do not see any real reason why they should answer. This 
method has been used with success by Professor Bowley 
and his collaborators in the investigations referred to 
previously and has elicited much valuable information 
about details of family life in working-class households. 

A further distinction must be made between different 
kinds of statistical inquiries. The information obtained 
during the course of an investigation concerns a number 
of things, animate or inanimate, and the investigation is 
circumscribed by a set of limiting conditions which deter- 
mine its scope, or the extent of the survey. We may 
say, in brief, that the information sought is that relating 
to a group of imits, whether these .imits are human beings. 


14 ELEMENTARY STATISTICAL METHODS 

or cattle, or farms, or crops, or manufacturing concerns, 
or machines, or ships, or consignments of goods does not 
matter. The problem is to extract useful information 
about the group. Here comes the distinction between 
different kinds of inqui^J^ The investigation may be 
concerned with the whole of the group or with only a part 
of it. These two kinds of inquiry may be differentiated 
by using the words Census and Sample, In the Census, 
the whole held is surveyed, in the Sample only a part of 
the held is surveyed. From a census we consider that 
we have obtained facts relating to the whole number of 
units coming within the scope of the inquiry, and are 
under no apprehensions when we seek to make deductions 
in general terms on the problem before us. On the other 
hand, \vhen we have only a sample inquiry, our deductions 
from the facts assembled only relate to this sample and, 
if we wish to generalize in terms of the whole group sampled, 
w’e must be sure that our sample is a representative one. 
Illustrations of the census type of inquiry are the Census 
of Population, the Census of Production, the Accounts 
of Import and Export Trade. In the Census of Population 
information is desired of all the persons who form part of 
the population, and every endeavour is made to ensure 
that all these units are brought within the scope of the 
inquiry. In the Census of Production 1930, information 
w^as required from all manufacturing concerns employing 
more than 10 w'orkpeople, but the Census of 1924 was 
extended to all firms, how'ever small. In the Accounts 
of Trade information is given concerning all consignments 
of goods entering or leaving the country, coming wito 
the definitions of Imports and Exports. Naturally with 
such large-scale investigations as these, much expenditure 
of time and labour is necessary before the end is reached. 


15 


STATISTICAL INQUIRIES 

Both these considerations imply expense. There are 
three factors to be taken into account, (i) the desirability 
of obtaining information, (2) the necessity for this informa- 
tion to be available to those who would use it withm a 
reasonable time after the date when the information is 
collected, (3) the expense involved in obtaining, collating, 
and presenting this information. In the case of the 
Census of Population the information has only been 
obtained every ten years, and the data and conclusions 
are only available some considerable time after the 
period to which the information relates. (The final 
reports relating to the 1921 Census in Great Britain 
and Northern Ireland were published in 1927.} A Census 
of Production was made in respect of 1907, 1912, 1924, 
1930. The final tables relating to 1924 are now being issued 
by the Board of Trade (1932).! On the other hand, inform- 
ation relating to the Foreign Trade is available month by 
month in considerable detail within a fortnight after the 
end of the month to which the data relate, and full details 
of the trade of a particular year are published within 
two years. This is only possible by using a large staff 
of Custom Ofiicials. 

At the same time as the Census of Production Inquiry 
1924 the concerns which were required by law to render 
information respecting their products, numbers employed, 
power used, etc., were circularized also by the Board of 
Trade, to the effect that they should voluntarily submit 
statements of wages paid, and hours worked by work- 
people, etc., during the same period. Not all, but con- 
siderable proportions, responded to this invitation and 

* The preliminary reports on the 1930 Census of Production have been 
available as supplements to the Board of Trade Journal in 1932 and in 
the first cart of 1933. 


1 6 ELEMENTARY STATISTICAL METHODS 

there resulted a large amount of useful information on 
the subject of earnings and hours worked, relating to this 
sample of Industry’. It was felt that this sample was 
sufficiently representative of the whole so that the facts 
which emerged might be considered as apphcable to 
Industry as a whole. This is aai illustration of what is 
meant by a sample inquiry'. 

There are two kinds of sample inquiry. The one where 
the investigators have no control over the formation of 
the sample, where the net is cast and satisfaction is felt 
with the result of the fishing, the other where the investi- 
gators choose deliberately the particular units which are 
allowed to form part of the sample, and every endeavour 
is made to obtain information about those, and only 
those. This second method is, of course, only possible 
when the limitations of the whole group, of w'hich a sample 
is taken, are clearly defined and where such picking and 
choosing can be done. In this case an effort is made to 
choose a random sample, which from the start will, it is 
hop>ed, be representative. The method is to pick the 
individual units out of the whole group by some mechanical 
process, which allows every unit the same probability 
of entering the sample and where blind chance alone 
determines whether one or another is chosen. This 
method was used by Professor Bowiey and associates in 
the investigations referred to previously. The problem 
was the finding of information relating to working-class 
conditions of living in a number of towns. The local 
directory gave for each town the limits of the whole 
group to be sampled. The purely mechanical process 
of turning over the pages of this directory, marking there 
each twentieth address as it occurred gave a twentieth 



STATISTICAL INQUIRIES i? 

sample of the whole town. Information was sought 
about the families at the marked addresses. This same 
method has been used in recent years by the Ministry o 
Labour in order to get more detailed information than 
had been available previously, respecting that part of 
the population coming within the scope of the Unemploy- 
ment Insurance Acts, to which the unemployed belong. 
Registers of names are kept by the Central Authority, 
these were consulted, names picked out at definite mtervals, 
and those persons assumed to form the sample. Details 
were required of these persons only. This method of 
choosing the constituent parts of the sample gives what 
is hoped to be a random sample of the total group. 

In the first method the investigators have no control 
over the sample and do not know, when the sample is 
obtained, whether it is representative or not. For 
instance, an Insurance Company doing life business 
attracts a certain number of members into insurance; 
from each, certain particulars as to age, family history, 
occupation, and so on are obtained. At any given time 
those entering into contracts with the Company within 
the past year, say, may be considered as a sample of the 
Insuring Class of the whole population, and, if the Company 
knows of no reason why there should be constraining 
forces at work which would tend to make theirs a biased 
sample, this sample is a random one of this class of the 
population. But the onus of proof is on the Company ; 
it cannot be assumed off-hand that the sample is random, 
and if any conclusions were to be drawn from this sample 
which should be assumed equally applicable to the whole 
of this class of the population, then somehow it must 
be demonstrated that the sample does appear to be 


i8 ELEMENTARY STATISTICAL METHODS 

representative. This question of the random nature of 
the sample must be dealt with before generalizations 
are permitted. 

There are certain statistical tests which may be used to 
determine whether the sample is representative or not, 
and these tests should be applied in all cases of sampling, 
even where the sample has been obtained by mechanicd 
choosing of those who form part of it, that is, in the case 
of those samples obtained by the second method described 
above. This is necessary because we are not certain — 

(fl) that the method of choosing has been operated 
correctly, 

(6) that there was not some biased order or arrange- 
ment of the units in the whole group from which 
the sample is chosen, and that this bias may 
not have prejudiced the random nature of the 
sample. 

With regard to the samples obtained by the first method, 
that is, when the investigators have no control, these 
tests are certainly necessary; they supply the only 
evidence of the random nature of the sample, if it is of this 
kind. These tests will be described later. (See Appendix 
to Chapter 4.) 

The Sample method of Inquiry has many advantages 
over the Census method. The expense of time and labour 
involved in the Census method are no longer in evidence. 
With a comparatively small number of units in the Sample, 
the assembly and anal5^is of the statistics is reduced 
considerably, much time is saved, and much labour ; 
the results of the inquiry are available reasonably soon 
after the inquiry is instituted. Moreover, in many cases. 


19 


STATISTICAL INQUIRIES 

with a sample inquiry, if the sample is not too large, 
skilled investigators can be used for the collection of the 
data, and instead of a vast multitude of sources from which 
the information is obtained, with a consequent possible 
lack of accuracy, there is a small number of sources. 
Where a random sample, obtained by the second method 
described above, is possible, it is certainly preferable to 
the Census method, and its use is becoming more extensive. 


APPENDIX 

Definition of Imports 

Quoted from the Monthly .Accounts relating to Trade and Navigation of 

the United Kingdom. 

" The particulars in respect of imported goods from 
which the official trade statistics are compiled are allowed 
to be given by Importers or their Agents at any time 
within fourteen days after the arrival of the ship. Further 
extension of time is given within which to make any 
necessary amendments. ... It follows that the statistics 
published for a month do not precisely represent the 
imports . . . which occurred in that period. . . . The 
following classes of goods arriving in this country are not 
included in the Import Statistics : — 

(a) Personal luggage, including parcels brought by 
passengers for private use, so long as such parcels do not 
contain dutiable goods. Dutiable goods contained in 
passengers parcels are included in the statistics ; 

(b) Fresh fish and shell fish of British taking, landed 
from British ships arriving direct from the fishing grounds ; 

(c) Ships’ stores, military and naval stores on board 



20 


ELEMENTARY STATISTICAL METHODS 


government vessels, bunkers (coal and oil), and ballast 
of no commercial value ; 

(d) Mats, sacks, cases, etc., used as packages of im- 
ported goods ; 

(c) Goods directly imported by Ambassadors and 
Ministers accredited to this Kingdom ; 

{/) Old vessels bought by owners from abroad. 


Number of Unemployed 

Each person coming wdthin the scope of the Unemploy- 
ment Insurance Acts is supplied with a book, which is 
supposed to be lodged with an Employment Exchange 
when the person registers as unemployed. The number 
of Unemployed at any time is determined by the number 
of books so “ lodged 

Working Class 

The New Surv'ey of London Life and Labour under- 
taken in 1928 took account of particulars relating to a 
large sample of households in London. Those details 
referring to Middle Class households were excluded from 
the subsequent analysis, so that in effect the definition of 
“ Working Class ” was a negative one. The following 
information is given in Volume 3, Appendix I, of the 
New Survey ; — 

“ Middle Class : Middle class households are dis- 
tinguished primarily by the occup 3 .tion of the head. 
But some measure of discretion must be employed. 



STATISTICAL INQUIRIES 


21 


Special Cases : 

(1) Professional and clerical occupations to be 
ranked middle class. This includes commercial 
travellers, insurance agents, etc. 

(2) All publicans to be ranked middle class. 

(3) Shopkeepers to be ranked middle class unless 
the shop is subsidiary to the work of the principal 
wage-earner, or the income from the shop is 
definitely below £250. 

(4) Self-employers, small employers, master-men, 
etc., not to be ranked as middle class unless their 
incomes are definitely over £250 a year. 

(5) Hawkers, street-traders, etc., to be ranked 
working class. 

(6) Shop assistants to be ranked working class 
unless their work is managerial or supervisory 
(e.g. departmental head or shopwalker) or unless 
their wage suggests middle class rank. 

(7) Police sergeants to be ranked working class, 
inspectors middle class.” 



Chapter 3 

ASSEMBLING STATISTICAL DATA 

Having dealt with the kinds of Inquiry, we must now 
consider the nature of the information which has been 
obtained. The essential fact which must be appreciated 
is that whatever the inquiry is concerned with, what- 
ever the nature of the inquiry, the information collected 
refers to a number of individuals or units of some kind 
or another, which together form the group with which 
the inquiry is concerned. The units may be households, 
persons, houses, sheep, farms, ships, consignments of 
goods, mines, firms, and so on. Every unit possesses 
a number of peculiarities, characteristics, or attributes, 
and the units possess these in a variety of ways or degrees, 
and it is the varying nature of the possession of these 
characteristics by different units which enables us to 
distinguish between them. The number of these 
characteristics is large, some of them are susceptible of 
measurement, some are merely accorded an adjectival 
description. For instance, a unit in a group may be a 
male person, aged 26 years, 5 ft. 9 br. iu height, 10 st. 9 lb. 
in weight, engaged as a clerk with a particular firm, 
dwelling in a certain house, having an income of £150 
a year, married with one child, wearing on the ist of July, 
1931, a blue suit, green tie, black shoes, and having blue 
eyes, black hair, and so on. When we exhaust all this 
person’s characteristics we isolate him from others. The 
sum total of his characteristics enables his friends to know 
him apart from others and establish his identity as a unit. 


22 


assembling statistical data 23 

Similarly, a unit in a group may be a ship, using steam as 
motive power, engaged in the Foreign Trade, registered 
under the British flag, arriving at Hull on the 21st July, 
1931, with a particular cargo, of a certain tonnage, and 
canying so many in the crew, and so on. The sum total 
of these characteristics determine the vessel’s identity. 
So with a manufacturing concern, it is situated in a 
particular place at a certain time, it is engaged in a certain 
trade, it employs so many hands, it uses so much horse- 
power, it has so many machines, it has so many storeys, 
so many windows, so much floor space, and so on. These 
illustrations indicate sufficiently what is meant by 
peculiarities, characteristics or attributes, and the fact 
that some are indicated by a number, e.g. height, and 
some by a description, e.g. eye-colour. Now, in a given 
inquiry we are generally only concerned with a number of 
these characteristics, we may not, for instance, be interested 
in persons’ heights, but we may be interested in their 
incomes ; on the other hand, in other inquiries we may 
be concerned with heights and not with incomes. The 
information collected, then, in the course of an inquiry 
concerns a number of units in a group, defined and limited 
in some way or another, and of these units we obtain 
details as to the possession of one or more, but probably 
not all, of their characteristics. The limits of the group 
in the inquiry are determined by the units all possessing 
some one or more characteristics in the same manner or 
degree. For instance, in the Census of Population in 
England and Wales in 1931, the group coming into the 
inquiry consists of those persons in England and Wales 
aUve on the night of April 26th, 1931, and excludes all 
those units who do not possess the characteristics of 


24 


ELEMEKTARY STATISTICAL METHODS 


human beings, it also excludes those human beings who 
although alive were not in England and Wales on that 
night. All members of the group possess therefore 
certain characteristics in like manner 

The information, so described, concerning these units 
is supplied to us in the shape of filled-in schedules or forms, 
or as the contents of card index boxes, or as ledgers, or 
day-books, and so on. Such may be called the raw 
material of statistics. WTiat are the processes which 
this raw material goes through before the finished article 
is produced ? These processes are sometimes called 
“ Statistical Methods ", and the finished products 
" Secondary Statistics ” as opposed to " Primary 
Statistics ", the result of tabulating the statistical 
material in its crude form before entering the statistical 
mill. WTiat is the first process to which these crude data 
must submit ? Ob\iously to a checking of the accuracy 
of the information supplied, a careful scrutiny which 
should establish the existence of any obviously wrong 
pieces of information, and correction w'here such seems 
necessary. After this, when it is felt that the information 
is trustworthy, the next step is to assemble and condense. 
No one can appreciate at a glance or even after careful study 
hold in the mind the information contained in a hundred 
or a thousand or more schedules ; no one, by turning over 
page after page of a book containing information respecting 
many units, can get a proper notion of the detail contained 
there. It is essential that some process of condensation must 
take place. This process results in Statistical Tables. 
This tabulation necessarily involves the grouping together 
of units into classes. The kind of tables produced and 
the grouping into classes are both determined by the 


ASSEMBLING STATISTICAL DATA 25 

nature of the infomiation obtained, i.e. by the particular 
characteristics possessed by the units with which the 
investigation was concerned. Broadly, we may say that 
the process of assembly and condensation groups together 
those units which are alike in respect of certain 
characteristics. Detail is necessarily lost, the individual 
unit becomes merged in a group. Few of us can identify 
ourselves in the Census Tables, for instance, each of us 
realizes that he or she is merely one of a large number in 
a particular table. We have to consider what is involved 
in this process of grouping together like with like, the 
process called Classification. 

Classification is determined by the characteristics 
possessed by the individual units. These characteristics 
may be considered as of two kinds, (i) those which may 
be referred to as descriptive and (2) those which may be 
referred to as numerical, being susceptible of quantitative 
appreciation. In the first kind are such characteristics 
as sex, civil condition, occupation, etc., kind of trade in 
which employed, in the case of persons ; kind of goods 
carried in the case of ships ; type of industry, type of 
power used, in the case of factories, and so on. In the 
second kind are such characteristics as age, height, income, 
rent paid, and so on, in the case of persons ; tonnage, 
number of crew, in the case of ships ; value of products, 
wages bills, rents, and so on, in the case of factories. Some 
of the characteristics of the first kind may be very easily 
classified by means of some natural or physical lines of 
demarkation, and these natural or physical differences 
have determined the classes into which units possessing 
this character should be placed. It is easy in these cases 
to determine whether two units are alike or not in respect 


26 ELEMENTARY STATISTICAL METHODS 

of this character. In this category, for instance, is sex 
in the case of persons ; kind of motive power in the case 
of ships, whether sail or machines ; similarly in the case 
of vehicles, whether mechanically propelled or horse- 
drawn ; and so on. In these cases the method of classi- 
fication is obvious, and naturally advantage is taken of 
this when units are to be grouped together, like with like. 
But there are many characteristics where this classifica- 
tion is not so easily achieved. For instance there are 
characteristics which are possessed in varying degrees, 
which merge into one another when any grading is 
attempted, such as eye-colour in persons. We may 
classify eves as brov^n and blue, but find cases where the 
colour is more properly described as grey, we may also 
decide that we shoxild have a dark and a light brown class, 
also possibly green, and finally arrange a scale of classes, 
dark brown, light brown, green, grey, blue. But when we 
attempt to fit our units into these five classes we may have 
dcubtful cases w’hich we hesitate to describe as dark or 
light brown, others w'here it is difi&cult to decide whether 
the colour is grey or blue. There are really a large number 
of eye-colours and if we fix on a definite number of classes, 
we shall always have border cases where it is not easy to 
decide to which class such cases belong. So in the case of 
classifj'ing commodities according to the state of manu- 
facture at which they have arrived when they are being 
bought and sold. We think vaguely of raw materials 
and manufactured goods, we recall to mind raw cotton, 
raw wool, bedsteads, clothing, motor-cars. But when 
we survey the whole range of goods which enter into 
trade we realize that there are many grades between raw 
materials at one end of the scale and manufactured goods 



ASSEMBLING STATISTICAL DATA 27 

at the other end of the scale. For instance, consider 
the case of wood. Shall we reserve the words “raw 
material “ to be applied only in the case of trees standing 
untouched in the forest, or when the tree has been felled, 
or when the branches have been lopped off, the trunk 
alone remaining, or when the log has been dragged to the 
saw mill, or when it is sawn into planks, or when the 
planks are sawn into standard lengths. In the sense that 
any one process of manufacture turns raw materials into 
finished goods, then wood in all these states is at the 
same time raw material and finished article. The same 
kind of problem arises when we consider the stages, iron 
ore, pig iron, steel, rails. Which are raw materials and 
which are finished goods ? After considering problems 
of this kind, in the end no strict classification into these 
two groups is really attempted. In the accounts of the 
Foreign Trade of the United Kingdom, for instance, there 
are two main groups which are defined as “ Raw Materials 
and Articles Mainly Unmanufactured ’’ and “ Articles 
Wholly or Mainly Manufactured and for each class 
of commodity there are further subdivisions into goods of 
each class which range from raw material to finished 
product, no definitions involving processes of manu- 
facture being attempted ; the distinctions made are those 
due to description of the goods. 

We find, therefore, that in certain cases we can group 
like with like because there are fundamental distinctions 
between units, in other cases where classification is 
difficult we have to be satisfied with grouping together 
those units which are nearly alike, but we realize that there 
are probably cases where a unit is placed in one class 
with others to which he is akin, and does not differ by 


28 ELEMENTARY STATISTICAL METHODS 

much in respect of the particular character from another 
imit which is placed in a neighbouring class. If we want 
to condense the original information into manageable 
proportions we have to be satisfied with this state of 
affairs. The same sort of difSculties arise when con- 
sidering classification of units according to those characters 
of which we can get a quantitative appreciation. In 
certain cases there is no doubt of the limits of the classes 
which are used. For instance, if we are classifying house- 
holds according to the number of persons, these numbers 
range i, 2, 3, 4, 5, and upwards. We have no difficulty 
in saying that one household is like another in this respect, 
they both contain the same number of persons. Similarly 
goods trains can be classified according to the number of 
wagons attached. On the other hand, there are many and 
various ways in which particular characters may be 
possessed by individuals. The heights of persons, for 
instance, are of indefinite variety, so also are ages, 
or wages. The same is true in the case of tonnage of ships. 
Where "we are dealing with measuring as against courUing 
we find an infinite number of possibilities, and here again 
in classification no attempt is made to ensure that like 
with like go together in one class ; this is impossible, all 
that is done is to put those units together which are 
nearly alike in respect of the particular character. Thus 
we group together those whose ages are 20 and more but 
less than 25, those who are 25 and more but less than 30, 
and so on. We group together those whose heights are 
5 ft. 6 in. and more but less than 5 ft. 9 in., those 
3 ft. 9 in. and more but less than 6 ft., and so on. 

The question of definition is important in classification. 
In the first place, the question arises as to the definition 


29 


assembling stattstic.\l data 

of the units, whether the scope of the inquiry is clearly 
defined so that all those units, which should form part 
of the whole group to be considered, are included. 
Secondly, the characters about which information is 
obtained are to be clearly defined. Thirdly, the extent 
to which the units possess these characters must be defined, 
so that we get those together which are really alike, when 
we perform the necessary grouping for purposes 
tabulation. Let us take an illustration of the difficulties 
which arise from the Census of Population. The problem 
is to get information relating to the whole population 
of the country at a particular time, part of this information 
is to be concerned with some of the environmental con- 
ditions. It is decided that the unit, for the most part, 
shall be the household and that the head of the house- 
hold shall be responsible for supplying the relevant data. 
Census forms are therefore to be distributed to ail the 
heads of households in the country. This procedure 
appears to be simple when we imagine an official pro- 
gressing slowly along street after street of a town, knocking 
at the doors and leaving the appropriate form in the 
correct hands. But the Census official’s experience is 
greater than ours, he has experience of finding a house 
with two distinct families Living there, and he has to 
have guidance as to whether that is one household or 
two. If he does not ask for help from those above him, 
he may decide to leave one form, w'hereas another official 
faced with the same problem solves it differently and 
leaves two forms. This sort of procedure is damaging 
to the accuracy of the data and must be avoided. The 
Census official must therefore have help in deciding wffiich 
groups of people living together are households for Census 


30 


ELEMENTARY STATISTICAL METHODS 


purposes and which are not. In other words the Unit 
of the Inquiry' has to be defined efficiently so that no 
difficulties arise. The Census office gives instructions 
that the unit household is that group of persons living 
together in a Structurally separate dwelling place ”, 
a place from which access may be had to the public streets 
without interfering with or interference from another 
group of persons. Again, in the Census, information 
is required as to occupation. In recent years, owing to 
the prevalence of Unemployment in the country, careful 
instructions were given so that people could describe 
themselves properly under this characteristic heading 
" Occupation If a man had been a carpenter but had 
not worked at his trade for a considerable time, but had 
been employed for brief intervals in this period of 
unemployment as a market-gardener or a labourer, how 
should he describe himself on the Census Schedule ? 
The instructions given for the filling up of the Census 
Form in 1931 were to the effect that if a person were 
unemployed, but had previously been occupied as a 
carpenter, he should describe himself as Carpenter 
(Unemployed), but where a person had no hopes of further 
employment in his previous occupation, and had engaged 
himself in a new one, he should describe himself as 
belonging to the latter category. 

Finally, on the Census Form careful instructions are 
given to assist persons in the filling up of the form so that 
vague descriptions of occupation may be avoided, in order 
that, when the subsequent grouping of like with like 
takes place, there is confidence that persons who are alike 
in respect of this character have described themselves in 
the same way. 


31 


assembling statistical data 

This emphasis on definition is obviously necessary, 
otherwise when the process of assembling the data is to 
be performed, this may be found almost impossible, or 
the exact meanmg of the tables finally obtained may be 
obscure. Further, the definitions used in an inquiry 
should accompany the tabular presentation of results 
in order to render their meaning perfectly clear. 

To summarize, the information obtained as the result 
of an inquiry may be represented thus : — 

Inquiry relating to place X at time Y. 


Units 



Characters 




A 

B 

C 

D 

E 

F 

1 


B| 

Ci 

Bi 

E, 

F» 

2 


B. 

c. 

B, 

Et 

F, 

3 

A, 

B, 

c. 

B, 

E, 

F, 

4 

A. 

B4 

C4 

B* 

E. 

F4 


X and y need definition ; the Units need definition ; 
A, B, C, . . . need definition. 


Unit I possesses character A to the extent A„ character 
B to the extent B,, and so on. Unit 2 possesses character 
A to the extent A,, and so on. 

For example : — 

Inquiry relating to London in the month of January, 

1933- 


Persons 

Sex 

Age 

Wage 

per 

week 

Civil 

Condition 

Occupation 

1 

Male 

30 

53s. 

Married 

Taxi-driver 

2 

Female 

10 


Single 


3 

Female 

28 

33s. 

Married 

Waitress 

4 

Male 

55 

50s. 

Married 

Railway Porter 


32 ELEMENTARY STATISTICAL METHODS 

London must be defined ; there are many “ Ix)ndons ” — 
the area administered by the London County Council, 
the Metropolitan Police Area, and so on. Is the month 
of January, 1933, the calendar month or the four weeks 
ending 28th January ? Under “ Wages ” are these 
figures average weekly earnings for the month, are they 
gross or net earnings, that is, do they include or exclude 
insurance and mutual benefit contributions, tips, allow- 
ances for miiform and travelling, and so on ? Is a person 
to be described as Married who is living apart from his 
wife ? 

Note that it is possible that a unit in the group may 
not possess a particular character at all. 

In a subsequent classification of occupations into 
groups such as Skilled and Unskilled, in which 
category will such as Taxi-drivers, Waitresses, Railway 
Porters go ? 

From these pieces of information in the raw state 
tables are obtained, where the units are grouped 
together. 

Tables are intended to summarize the information 
obtained during the course of the inquiry, consequently 
we never expect these tables to present the whole 
of the information obtained. Since this is the c^, 
it naturaUy follows that, owing to the factors of time 
and labour involved in assembling and condensing the 
statistical data, a certain amount of choice is necess^ 
as between different matters of interest, so that tabulation 
is concerned primarily with those particular aspects of 
an inquiry which wiU serve most usefully an immediate 
need. The original material should be preserved, so that 
it is possible to proceed to further tabulation if the necessity 


33 


ASSEMBLING STATISTICAL DATA 

arises in the future, owing to a transfer of interest to some 
other problem which previously had not been considered. 
Since the tabular presentations of statistical data are the 
first evidences, available to a vdder public than the 
investigators, of the results of an inquiry, the information 
contained therein should be perfectly clear and precise, 
that is, explanations should accompany the tables, where 
necessary, in order that there should be no possibility 
of ambiguity in interpretation of the meaning of the 
tables. For example, in the Annual Report of the 
Secretary for Mines for the year ended 31st December, 
1928, Table 25, in the Statistical Appendix relating to 
Outputs, Costs of Production, Proceeds, and Profits of 
the Coal Mining Industry during 1928 contains many 
explanatory notes. In the first place the information 
relating to South Wales and Monmouthshire is that for 
the year ending 31st January, 1929. A note explains 
the sources from which the information is obtained — 
“The particulars are based partly upon the returns 
made for the purpose of wage ascertainments for certain 
districts, and partly upon other returns supplied by 
individual Colliery Owners.” The figures in the table 
do not refer to the whole Industry but to 96 per cent 
in the case of the third and fourth quarters of the year 
and to 97 per cent in the other quarters. There are 
explanations of what are included in the items “ Wages ” 
and “ Other Costs of Production 
Tables give information relating to the variety of ways 
or d^ees in which units in a group, which are alike in 
certain respects, possess one or more characteristics. 
A simple table contains data respecting one characteristic, 
the information relating to others not being included. A 


34 ELEMENTARY STATISTIC.AL METHODS 

more complex table may at the same time contain figures 
relating to several characteristics. In any of these cases 
those characteristics possessed in like manner by dl the 
units should be referred to in the heading or description 
of the table. For instance, in the Table 


Ages of Males Married in England and W^les in 1925 


i Under 
Ages I 21 

1 

! • ; 

\ ! 

21- 1 25- j 30- 

! ! 

as- 

1 

45- 

55 and 
up- 
wards 

Ages 

not 

stated 

Total 

1 

Number j 12»011 



97,797 

1 

100^ 1 37^19 

! 

26.095 

11.203 

8,801 

1.913 

295.669 


The 295,689 persons have these charactemtics m 
common : they are of the same sex, they mamed m the 
year 1925 in England and Wales ; so they are differentiate 
from Females, from those who did not mar^ m tha 
vear in England and Wales. On the other hand they 
were not aU of the same age. Their age distnhuta 
Ts indicated in the above table, where there axe grouj^ 
together as like in respect of this character th^ » 0 
Jes are nearly the same, aU those aged 30 and over but 
lis tnan 35 for example, being placed together m the s^e 
group. But these men possess many other 
Lu^tion, height, hair<olour, income, \ 

mformation on these points is given m this ‘^“e _ 
further information were avaUable (as it is, as 

characteristics in C^bShing it, we 

L resnect of these other characters. For instmce we 
might have a more compUcated table showmg ge an 
Occupation in this way. 


assembling statistical data 35 


Ages 

Under 

21 

21- 

25- 

30- 

35- 

45- 

1 

1 

55-' 

j 

Age 

not 

stated 

Total 

Agriculture 







• 


Coal Mining 





! 

j 

1 


Metal 

Manufacturers 







i 

! 


Textiles 







1 

1 


Chemical 

Manufacturers 










Others 







1 

1 

i 


Total 








1 

1 



Each figure in this table would represent a group alike 
in certain respects, but with individuals differing in other 
respects and, if further information were available, we 
could proceed to further tabulation in this way : — 
Distribution of Earnings of Male G)al Miners aged 25 
and over but less than 30 who Married in England and 
Wales in 1925. 


Earnings per Week 


Un- 

employed 

Under 

20s. 

20s.- 

2Ss.- 

30s.- 

35s.- 

40s. 

and 

over 

Total 




1 


1 

j 



We should then have a number of tables of this kind, 
the headings of which would indicate in what respects 
the uiits in the group were alike. 


36 ELEMENTARY STATISTICAL METHODS 

In those cases where the number of classes into which 
units may be grouped in resp>ect of a particular 
characteristic is large, the amount of information 
contained in one table is necessarily limited by the size 
of the paper on which the table is presented, but where 
the number of classes is small, one table may be used easily 
to give information respecting a number of characteristics. 
For example the students attending an educational 
institution may be divided up into Male and Female, 
Day Students and Evening Students, First, Second, or 
Third Year, and they may be classified according to the 
Course they are pursuing. We may get in a simple table 
the information relating to these four characteristics 
in this way : — 


Students Attending the Institution in the Session 1930-1 




37 


assembling statistical data 

A table of this kind is presented as a regular feature of 
the Statistical Abstract issued by the Board of Trade 
relating to Shipping engaged in Foreign Trade. The 
table gives “ Total Net Tonnage of British and Foreign 
Vessels, distinguishing Sailing and Steam, Entered and 
Cleared, in the Foreign Trade at Ports in the United 
Kingdom, with Cargoes and in Ballast and with Cargoes 
only ”, for 15 years. The units (ships) possess these 
four characteristics as to Flag, motive power, Entering 
or Clearing, and the state of the hold, in varying ways, 
and the information respecting these matters is easily 
compressed into a single table in the Abstract, 

The construction of a table is determined to some 
extent therefore by the kind of classification adopted for 
the purpose of grouping together those units which are 
alike in respect of a particular characteristic. 

In some cases the figures in the tables refer to the 
number of units, such as in the illustrations above, where 
the number of males marrying is quoted, or the number 
of students, but in other cases the figures in the tables 
refer to some further character, the number of units not 
being shown at all. For instance, in the Shipping Tables 
mentioned the numbers in the Table are total net tonnage 
and not number of vessels. For the purpose of indicating 
the quantity of shipping engaged in the Foreign Trade 
the total net tonnage is of more interest than the total 
number of vessels. Similarly in the Tables referring to 
the Foreign Trade, the figures are those relating to the 
total of quantities and values of consignments and not to 
the number of such. So in the Tables relating to the 
Coal Mining Industry, the figures generally give Output 
of Coal in tons and the Number of Men working and not 


38 ELEMENTARY STATISTICAL METHODS 

the number of mines, though where this number is of 
interest, e.g. in Tables relating to the use of Mechanical 
Appliances in Coal Mines it is given. 

To a considerable extent the detailed form in which 
the tabulated matenal is presented depends upon the 
persons responsible for the construction of the tables, 
and the factors which influence them are the uses which 
such tables are to serve. That is to say a table is con- 
structed in the light of some problem or matter of interest. 
For instance, the following table summarizes the informa- 
tion relating to age at death of persons dying in Great 
Britain in 1925 : — 


Agb Distsibutiom of Pbssons Dying in Grsat Britain in 1925 


A«eia 

;«an 

Uoder 

5 

5- 


25- 

as- 

45- 

65- 

65- 

75 and 
over 

Total 

NlUAbv 


15.5^ 

22.829 

24410 

33.388 

53443 

77300 

104882 

1U.865 

53a348 


Such a table may serve satisfactorily to convey roughly 
the information available as to age at death, but a person 
interested particularly in Infantile Mortality would find 
a table of this kind much more useful : — 


Agb Distribution of Pbrsons Dying in Grbat Britain in 1925 


A«rin 

ytnn 

Under 

1 

1- 

2- 

5- 

15- 

25- 

36- 

45- 

66 and 
over 

Tote] 

Nooiber 

6S.748 

17,279 

15304 

15.642 

22320 

24.410 

1 

33388 

1 

53.143 

208,607 

538348 


Actually he would probably dispense with the greater 
part of this table and concentrate on the first group and 
use a table : — 


39 


assembling statistical data 

AGB Distribution of Infants tmoER one year Dying in 
Great Britain in 1925 


Age 

in 

Months 

Under 1 I 

i 

1- 

! 

3 - 

i 

6- 

1 

9- 

Total 

Number 

26,810 1 

1 

10.446 

9,668 

1 ' 

1 

8,096 

7,726 

62.746 


We find many different kinds of grouping of ages in 
tables giving age distributions, these differences being 
due to the different material tabulated and the vanety 
of uses to which the information is put when tabulated. 

This ever-present possibility of variations in the group- 
ing exists wherever we are dealing with a character which 
can be measured. We can distinguish bet\.een two t5iT>^ 
of tables in this connection. On the one hand there is 
the kind of table shown in official returns where as much 
detail is to be presented as is possible, consistent with 
space in a book and expense, and where it is anticipated 
that this table will primarily serve the purpose of stating 
facts, and will be the basis of argument and discussion on 
the part of persons interested. In such a table we should 
expect, for instance, that an age distribution would show 
ages in yearly groups. On the other hand, there are those 
tables, perhaps derived from official returns, where great 
detail is not essential, and in fact where too much attention 
to detail would mar the appearance of the table, and where 
those interested are expected only to gather the general 
id 3 a of the variation existing in a group in respect of a 
particular character. In such cases the individuals put 
together in a group perhaps range over a wide interval 
of the variable character. As an extreme instance of this 
we may consider a simple table of this kind : — 


40 


ELEMENTARY STATISTICAL METHODS 
Population of England and Wales, 1921 


Age in years 

1 

Under 15 

15-64 

65 and over 

Total 

Nomber 

10,500,455 

25,095,139 

2.291,105 

37.886.699 


Here attention is drawn to these particular age-groups 
representing, more or less accurately, that part of the 
population of active working life, that part requiring to 
be supported before they become “ units of production ”, 
and that part retired from active participation in pro- 
duction. The purpose of the table is to enable comparisons 
to be drawn between these three figures and the total. 
This purpose is reasonably well served by such a table, 
and certainly would not be suited if the table giving the 
age distribution of the population at every year of age 
were used. In the latter case we should not be able to 
see the wood for trees. 

When we are dealing with tables conveying information 
respecting a character, possessed by the units, which 
is only susceptible of descriptive classification, the same 
kind of distinctions may be drawn between “ official 
tables ” for information only, and those tables which have 
a place in an argument or discussion, and which are 
supposed to help to elucidate or clarify a particular point. 
In the former case the tables again should contain as 
much detail as possible, and the practice in this respect 
should be guided by preceding tables showing the results 
of parallel investigations in previous years, in the first 
place, because the previous practice was presumably 
concerned with emphasis on matters of interest and, in 
the second place, because one of the most necessary 


ASSEMBLING STATISTICAL DATA 4^ 

purposes of statistics is to enable comparisons to be 
made over a period of time. Of course, if circumstances 
have changed, as they are continually changing, the 
form of the tables will necessarily be modified so as to be 
consistent with these changes, but as far as possible the form 
should be kept as close to the preceding form so that com- 
parisons will not be vitiated, where these are possible. In the 
latter case the tables should be as simple as possible 
consistent with the material, and a golden rule is to use 
two tables instead of one if there is the least fear that 
one would contain such a mass of material that the 
essential facts therein would tend to be hidden. Briefly 
we may say, that after the information is obtained in its 
crude form, it should be tabulated in as detailed a manner 
as is considered necessary for the particular data , these 
tables which result are the primary statistics of the 
investigation. Subsequently from the primary tables 
others may be obtained to serve particular purposes in 
order to emphasize this or that aspect of the problem with 
which the investigation is concerned. If the inquiry is 
a simple one, it may well be the case that no further tables 
are necessary after the preliminary tabulation has been 
performed, especially if the investigators are quite clear 
as to what statistical tables are required. But in the 
case of large-scale investigations, like those instituted by a 
Government Department, the statistical data might be 
used for many purposes besides those to which they are 
immediately put by the particular department, con- 
sequently the preliminary tabulations should be as 
detailed as possible so that they may be of value to 
later investigators. 


Chapter 4 

SECONDARY OR DERIVED STATISTICS 

The information obtained by an investigation is now 
available in the form of tables. These tables are useful 
to us in that they enable comparisons of various kinds 
to be made. The purpose of tabulation is partly to 
present the facts elicited by an inquiry, purdy as facts, 
and partly to present these facts in a manner suitable for 
the appropriate comparisons. Sometimes a table contains 
so many figures, and the comparisons which are of interest 
are those relating one particular figure to many others, 
that it is not easy at a glance to appreciate all the informa- 
tion contained in the table. Consequently subsidiary 
tables may be constructed from the original, which serve 
to emphasize more clearly the matters of interest, and 
possibly arithmetical processes may be applied to the 
tables in order to bring these out, or again graphical 
methods may be employed to attain this end. The 
application of such arithmetical jirocesses, which may 
be described as statistical methods, results in what may 
be called secondary statistics. 

One of the simplest methods of making comparisoi^ 
between figures is that of ratios. The ratio method is 
extensively used in statistical analysis in simple or com- 
plicated form and produces secondary statistics which 
are given different descriptions in different contexts, 
such as percentages, averages, rates, weighted averages, 

42 



B 

a> 


C 4 

<y> 


S 


c 3 

a 


O 


•2 

*3 


cD<x><z>co®a>; 2 -;rSj$S 2 Z^rSS«ao 

c^ irT CO o 05 oo ^5 52 S 0 

S S ^ S S 5o S O^S S t/5 CO CO 


o 

CD 


OO 

oT 


i i 2 s s i s qs 8.3 s ? s 

-rcot^'‘c>aoc»--? 5 «sar:?:::;;=; 2 SS 


•rtW^oo — .- 5:00 — oo^cg — ^ 


CO _ - 

OO CD CO 

CD oo 


coocr^c^CD^ooo 

Dl C^ C^ ^ 05 <D 


OO tr5 
C^ <N 


r^coo — QC — tD^00C00505®O00C0 

SJS^c^t^c^cDc^coco-^r^ojr^S^ 

CO CD C^ CD o^ r^ o^ 

ocT cT ocT ocT cd co o co co -c^ o ^ ^ 55 

S®”«^ 0 >r^--io — 05 joao 5 -^« 

QO tx. CD CD CD ^^05 00 CD */5 ^ CO Dl 


05 

CO 

to" 

O 

OO 


o 

CO 

Di 

05 ^ 

05 

cd' 


— COOC 4 '^OOCDO>: 2 *‘O^OOr^COJD^ 
CD^ric^-^oScDi/iOO^lO^^OCDO 
CO 05 05 CD C^CD^'^^O^C^I 
irrao'orCc^QOr«^‘^t^qf«t^'oc^i^co 
uocor^or^D<uoco 05 Uoco 05 *-^ 
OOr^CDCD'^CO'— 'OOO 


CD ^ ^ 


OO 05 OO 

C <1 — — 


CO 

s 

OO 

c^ 

t^ 

in 


•2 

e 

45 


GO 

OO 


45 

73 


r^r^^coc^^omcocor^cDr^ocsto 
0--iOCDr^-* — t^co® — CDCDDJDJ^ 
0400 — 05 OO 

CtT GO^ go 00^ CD ^ CD CD CO CD r>* o o 
— SoScsiOCOCSia0050505 
t^U0C0D4C<IO05t^t^CDU0^C0C^’^« 


CO 

try 


CO 

CO 


CO 


r^ 0 a 005'^00 05 ’^*-< 00 a 000 »/^ 0 ic 00 
l/5t^COCDlOr^UOD4t^Oi005lO'0‘COOO 
CDU0C^DICOC*IC4O5O5ii0 1^05 ID UO C^CD 
t^coc^coci^o^cit^try^Q^co try 

UOCDOCD’— ■OO^'^r^'^GOGO^COUO'^ 
•OiGOr^CD^O'^COCOC^ — ^ 


c^ 

05 ' 

CO 

CO 

C 4 


c 9 

4 > 

bo 

< 


1/5 

•o 

a 

;=) 


f 


45 

> 

o 


I I I I I I I I 

OU 0 OW 5 O>lDOl/ 5 OiDOU 0 O *0 
«^C 4 C^COCO^'^lDl/ 5 CDCDl>* C 

C$ 

ID 


s 

O 

H 


44 ELEMENTARY STATISTICAL METHODS 

index-numbers. The purpose in every case is the same, 
to reduce the comparable figures to such proportions that 
the appropriate comparisons are more easily made. 
Suppose, for instance, that we were concerned with such 
a table as that on page 43 which gives information of the 
age distribution of the male and female population of 
England and Wales at three census dates 1881, 1901, 1921. 

This table serves the purpose of emphasizing the 
increasing numbers in the population in 40 years, it 
emphasizes the numerical superiority of females over 
males, and the larger number of young persons compared 
with old persons. But much remains hidden until drawn 
out by means of secondary statistics. We cannot get a 
proper appreciation of the fact that the constitution of 
the p>opulation is in a state of flux, until we have a table of 
derived statistics which results from making comparisons. 
It is appropriate to compare any one figure in this table 
with several others, for instance, we can compare the 
number of Male Children under 5 ycurs in 1881 with the 
total Males in 1881, with the number of Females under 5 in 
that year, and with the number of Males in that age-group 
in 1901 and 1921. These comparisons will be related to 
three different matters of interest to those concerned with 
the size and distribution of the population. We therefore 
construct new tables giving ratios in the form of per- 
centages, as on pages 45 and 46. 

Table I (a) emphasizes the age constitution of the 
population by concentrating on the proportions in different 
age-groups. For instance we appreciate, by glancing at 
this table, the large part of the population under 15 years 
of age. Further, this table enables comparisons in this 
regard to be made between the male and female population. 


45 


SECONDARY OR DERIVED STATISTICS 
TABLE I (a) 


Proportionate Age Distribotion of Population in England 
and Wales in 1881. 1901, 1921 



1881 

1901 

1921 

Age (years) 

Male 

Female 

Male 

Female 

Male 

Female 

Under 5 

13*9 

13-2 

11-8 

IM 

9*3 

8*3 

5- 

12-4 

11-8 

IM 

10-4 

9*8 

8*8 

10- 

111 

10-5 

10-6 

9-9 

10*2 

9*2 

15- 

100 

9-6 

10-2 

9-8 

9*6 

9*0 

20- 

8*8 

91 

9*4 

9*8 

8*0 

8*6 

25- 

7-8 

80 

8’4 

8*9 

7*4 

8*2 

30- 

6*6 

6-8 

7-4 

7*6 

7*1 

7*7 

35- 

5-9 

60 

6-6 

6*6 

7*1 

7*4 

40- 

5*3 

5-4 

5 7 

5*7 

6*8 

7*0 

45- 

4-3 

4-5 

4-8 

4*8 

6*4 

6*3 

50- 

3-8 

4*0 

40 

4*1 

5*4 

5*3 

55- 

30 

3-2 

3-2 

3*3 

4*3 

4*3 

60- 

2*7 

2*9 

2-6 

2*9 

3*3 

3*4 

65- 

1-8 

2-0 

1*8 

2*1 

2*5 

2*7 

70- 

1-3 

1-4 

1-2 

1*5 

1*6 

1*9 

75 and over . 

1-2 

1-4 

1-2 

1*5 

1*4 

2*0 

Total 

100 

100 

100 

100 

100 

100 


TABLE I (fe) 

Number op Females per 100 Males In bach 
Age-group in England and Wales 


Age (years) 

1881 

1901 

1921 

Under 5 . 

100*3 

100*3 

97*5 

5- . 

100*7 

100*5 

99*2 

10- . 

99*7 

100*0 

99*2 

15- . 

100*9 

101*9 

102*7 

20- . 

109*3 

111*9 

117*6 

25- . 

108*7 

112*6 

120*9 

30- . 

107*7 

110*0 

118*6 

35- . 

106*9 

107*4 

115*6 

40- . 

107*9 

106*2 

112*7 

45- . 

110*3 

107*0 

107*0 

50- . 

110*4 

111*4 

107*4 

55- . 

111*1 

111*6 

108*6 

60- . 

113*7 

117*0 

110*7 

65- . 

117*0 

123*0 

119*4 

70- . 

121*1 

128*3 

134*2 

75 and over 

130*7 

141*1 

158*9 

Total 

105*5 

106*8 

109*6 


46 


ELEMENTARY STATISTICAL METHODS 
TABLE I (c) 


Number of &L\i.es and 
AS Percentages of 

Females in each 
THOSE IN 1881 IN 

Age-group in 1901, 1921 
England and Wales 


1881 

1901 

1921 

Age (years) 

Male 

Female 

Male 

Female 

Male 

Female 

Under 5 

100 

100 

105-6 

105-6 

95-7 

93-0 

5- 

100 

100 

110-9 

110-7 

112-6 

111-0 

10- 

100 

100 

119-2 

119-5 

131-0 

130-4 

15- 

100 

100 

126-7 

128-1 

136-2 

138-8 

2fr- 

100 

100 

132-4 

135-6 

130-2 

140-1 

2S- 

100 

100 

135-4 

140-3 

136-6 

151-9 

30- 

100 

100 

137-8 

140-7 

152-5 

167-9 

3S- 

100 

100 

138-9 

139-5 

171-0 

184-8 

40- 

100 

100 

133-4 

131-2 

181-8 

189-7 

45- 

100 

100 

138-8 

134-7 

212-3 

206-0 

50- 

100 

100 

131-0 

129-2 

199-9 

194-5 

55- 

100 

100 

130-2 

130-8 

204-6 

200-0 

60- 

100 

100 

120-S 

124-1 

176-5 

175-9 

65- 

100 

100 

122-0 

128-2 

194-1 

198-1 

70- 

100 

100 

123-5 

130-9 

177-2 

196-4 

75 and over • 

100 

100 

125-8 

135-7 

171-9 

208-8 

Total 

100 

100 

124-5 

126-0 

143-0 

148-6 


and different years. As between male and female, we note, 
for instance, that the figures for females are less than those 
for males in each age-group under 20 years, and more 
in the age-groups over 20 years. This is true for the 
three years 1881, 1901, 1921. Except in the latter year 
in respect of age-groups 45-, 50-, the age distribution of 
the female part of the population is definitely different 
from that of the male part of the population. This fact 
certainly cannot be appreciated by glancing at Table I. 
Tables I (&) and I (c) further bring out the contrast between 
male and female. Women appear to live longer than do 
men, consequently the proportions of females to males 
in the older age-groups is higher than in the younger 
age-groups. This is brought out in Table I {b). Further, 


47 


SECONDARY OR DERIVED STATISTICS 

comparisons are possible between i 88 i, 190I1 1921 
any changes with time in this respect may be therefore 
observed. The effect of the War is apparent in the 
proportions of females to males in the age-groups 20-45 


TABLE II 


Nombers Engactd in Certain Industrie w Great Bw^n 
1911 AND 1921. Occupied Hales aged 10 Years and Over 



1911 

1921 

Indnstiies 


per- 


Per- 

Number 

centage 

Number 

centage 

Fibbing . . * • 

Agficoltiire 

Coal and Shale Blining 

63,000 

1.301.000 

1.122.000 

0-5 
10' 1 
8*7 

63.000 

1.198.000 

1.294.000 

0-5 

8-8 

9-5 

Mafitifarfnre of Bncks, 





Cement, Futteiy, and 
• • • * 

162.000 

1-2 

161,000 

1-2 

Mami&ctoie of Chemicals. 





Esqilosives, Paints, Oils, 
Rubber, etc. . 

145.000 

1-1 

195.000 

1-4 

Manufacture of Metals, 





Machines, Inplemeirts, 
and Qmveyanoes . 

1,670,000 

12-9 

2.251.000 

16-5 

Manufacture of Textiles 

585.000 

4-5 

540,000 

4-0 

Manufacture of Cottons 
Manufacture of Wocd and 

257.000 

2-0 

234.000 

1-7 

Worsted 

118,000 

0-9 

115,000 

0-8 

M^nfactnre of Silk . 
Manufacture of Flax, Hemp, 

11.000 

0-1 

14.000 

0-1 

Jute, Rope, Cam^ a^ 
Canvas goods. 

40,000 

0-3 

33.000 

0-2 

Manufacture of Dyeing. 





teaching. luting, ai^ 
Finishing 

91.000 

0-7 

89.000 

0-6 

Total Oocnpkd . 

12,930.000 

100 

13.656,000 

100 


in the 1921 figores of the table. Table I (c) con<%ntrates 
on the growth in population in the four decades and we 
may note particulariy, for example, that whereas the 
female population as a u^ole has increased in 40 years 


48 ELEMENTARY STATISTICAL METHODS 

by about 50 per cent, the older part of it, including those 
over 45, has increased by about 100 per cent. It is not the 
object, here, to discuss these figures at great length, 
the emphasis is laid on the fact that, in order to appreciate 
the information contained in a table like I above, it is 
necessary that subsidiary tables like I (a), I {b), and I (c) 
should be prepared. This method of procedure is general. 
If we are concerned with a table which consists merely 
of two or three figures, it is possible that the relationships 
between them may be sufficiently evident without further 
calculations, but where a table involves a mass of figures, 
it is practically always necessary to reduce these, somehow, 
to ratio form. It is therefore usual in presenting tables 
of this kind to include in the tabulated informa- 
tion, the results of calculations of this nature, in order 
that those wishing to obtain a proper appreciation of the 
figures may do so without somewhat burdensome calcula- 
tions. As an illustration the table on p. 47 may be 
cited. 

The effect of such calculations as those which have been 
illustrated is to enable comparisons to be made by reducing 
the figures we are concerned with to what maybe considered 
as more manageable proportions, or to reduce a series of 
figures to a common denominator. 

The same kind of notion is in mind when we calculate 
those ratios which are known as averages. Here we are 
generally concerned with the ratio of numbers which are 
expressed in different units, and we find how much of 
one quantity would accrue to each individual unit of the 
other quantity in the whole group, if the distribution 
were equal as between individuals. For instance, in 
the following table figures are given respecting importation 


49 


SECONDARY OR DERIVED STATISTICS 
of certain articles for home consumption in the United 
Kingdom : — 


TABLE III 

Home Consumption of Imported Articles into United 
Kingdom, 1911 , 1921 



1911 

1921 


Articles 

Quantities 

Quantities 
per head of 
population 

Quantities 

Quantities 
per head of 
population 

Butter • . • 

Wheat, Grain, and Flour in 
equivalent of Grain 

Eggs . . . . • 

Beef, Fresh and Refrigerated * 
Mutton and Lamb, Fresh and 
Refrigerated 

Bacon and Hams . 

cwt. 4,167,140 

cwt. 111,497.962 
thou. 2,265,894 
cwt. 7,315,333 

1 cwt. 6,322,159 
j cwt. 5,681,307 

lb. 10-31 

lb. 275-86 
No. 50-06 
lb. 18-10 

lb. 13-17 
lb. 14-06 

cwt. 3,329,418 

cwt. 99,184.732 
thou. 1,263,660 
cwt. 10,972.014 

cwt. 6,811,617 
cwt. 6,255.717 

lb. 7-91 

lb. 235-74 
No. 26-82 
lb. 26-08 

lb. 16-19 
lb. 14-87 


Estimated Population . . 45,268,000 47,123.000 


In this table comparisons are made between the 
imported quantities and the total population, the result 
of such comparisons being given as “ average per head 
of population”. The effect of the changing population 
is thus eliminated and comparisons between the average 
in 1911 and 1921 enable effective deductions to be drawn 
as to the changing volume of imports over this period of 
years. The average presents a crude picture of the amount 
imported for consumption, by indicating what each 
individual person in the whole of the United Kingdom 
would receive if these imports were distributed evenly. 
Similarly, we speak of the average wage of a group of men, 
obtained by dividing the total wages received by the 
total number of men who receive these wages, and this 
average is a figure representing what each would have 
if the amount was divided equally. So also, we work 


50 ELEMENTARY STATISTICAL METHODS 

out the average rent of a group of houses, dividing the 
total of rents by the number of houses. An average is, 
then, obtained as a ratio in the form Numerator 
Denominator, where the Numerator represents the 
total extent to which a particular characteristic is p9ssessed 
by the whole of a group, and the Denominator represents 
the total number in the group, or it may be the total 
extent to which another characteristic is possessed by 
the whole of the group. For instance, in railway statistics 
the average wagon load is calculated for a particular 
area of railway operation over a certain period of time. 
This is obtained by dividing the “ ton miles ”, which is 
got by summing the results of multiplying each item of 
freight by the distance hauled, by the ” loaded wagon 
miles” obtained by totalling the distance moved by 
each wagon (loaded). So, in the Coal Mining Industry, 
the average output of coal (in tons) per ma^hift worked 
is calculated for particular areas over periods of time, 
by dividing the output by the total number of manshifts 
worked In both these cases neither the numerator 
nor denominator are totals of the original units, which 
are freight trains in the first example and coal imnes m 
the second, but are totals of characters possessed by these 


units. 

Sometimes the ratios obtained when makmg comparisons 
are called " rates " and may be expressed as " rate per 
cent ■' or " rate per miUe ” or “ rate per thousand . 
Whichever of these is used is merely a matter onginaBy 
of convenience, and latterly of precedmt We ^ to 
instance birth rates and death rates f 

by relating as numerator to denominator the b^ or 
dLths to a given period (usually a year) to the total 


SECONDARY OR DERIVED STATISTICS 5^ 

population, and expressing the result as a “ rate per 
tosand These derived statistics serve to indicate 
the natural increase and decrease of the population, 
and enable comparisons to be made from one time to 

another. 

Now that we have arrived at the stage of performing 
arithmetical calculations on our statistics some discussion 
is appropriate on the subject of accuracy. It is unlikely, 
when we are finding the ratio of any one number to another, 
that we shall get the quotient, as we do in easy arith- 
metical exercises, exactly without remainder, and the 
question naturally arises as to how many decimal places 
the answer is required ; in other words, what degree of 
accuracy do we want in the result. This question can 
never be answered in set terms, the answer depends on 
the data which are to be submitted to this process and on 
the particular problem. For instance, the figures in 
Table I (a) were given to one decimal place, and by doing 
so the table quite satisfactorily serves its purpose ; no 
advantage would accrue if the percentages had been worked 
out to two places of decimals. But if this table were to 
serve another purpose than that in this context, it might 
be better if greater accuracy were required ; if, for instance, 
the figures in that table were themselves to submit to 
arithmetical processes of division, then those figures 
should be calculated to two or three decimal places. In 
some cases, what matters is the extent to which figures 
given as a result of calculations can be appreciated by 
others for whom the work is intended. For instance, a 
number of things may be divided up into three groups 
in this wav 


ELEMENTARY STATISTICAL METHODS 


Group 

A 

B ! 

^ ! 

Total 

Percentage | 28 

54 

18 

100 

This table adequately represents the grouping, 
these percentages were shown in this form : — 

Group 

A 

! ® 

C 

( 

Total 

Percentage 

1 28-32 

53-87 

17-81 

100 


no advantage is gained. In fact there is a loss of efficiency 
on the part of the table as a vehicle of expression of a 
result, because persons reading the table have to sub- 
stitute 28 for 28*32, 54 for 53*87 and 18 for 17*81 when 
trying to understand it, and there is no doubt that this 
process of approximation is gone through. On the other 
hand, the table might be performing two functions at 
the same time, bringing out the different ways in which the 
total is distributed through these three groups, and at 
the same time emphasizing the fact that as far as two of 
the groups are concerned the distribution is nearly the 
same. For instance, a table of this kind : — 


Group ! A 

I ® ..i 

c 

Total 

Percentage i 29 *17 1 

1 28-64 

42*19 

100 


might be better from this point of view than 


Group 


Perceutage 


29 


B 


29 


42 


Total 


100 


SECONDARY OR DERIVED STATISTICS 53 

On occasions, it is more important to consider the 
number of significant figures to which a result is given 
rather than the number of decimal places involved. After 
all, the number of decimal places can readily be altered 
by changing the wording of the phrase of which the ratio 
we are considering is a part. Thus instead of speaking 
of a birth rate of 18-3 per thousand of the population, we 
can say a birth-rate of 183 per ten thousand, and the 
decimal place disappears. The total value of Imports 
into the United Kingdom in 1925 is given as £1,320,715, 190 
to the nearest £. This figure might be given as 
£1,321,000,000 to the nearest million, or to four significant 
figures. The birth-rate in England and Wales in 1925 
is 18-3 per thousand (to three significant figures). The 
percentage figures given in Tables I (6) and I (c) are given 
to four significant figures. The accuracy of a result is 
indicated roughly by the number of significant figures 
involved. The figure 18-3 quoted above as the birth rate 
in 1925 may be any number from 18-250 to 18*349 
inclusive we could only find out where it actually 
is within this range by recalculating this rate from the 
original figures of births and population. Thus roughly, 
at the outside, using 18-3 instead of a more correct figure 
may involve an error of -05 in excess or -05 in defect, 
i.e. -05 in relation to 18-3, or about 3 in a thousand. In 
statistical work the number of significant figures in a 
resiilt is indicated by the number of figures quoted, thus 
in Table I (c) the figure 1310 occurs, the ratio has been 
calculated to four significant figures, the last being o. 
If the ratio had been calculated to three significant figures 
only it would have been expressed as 131. Similarly, in 
large numbers such as those in Table II giving numbers in 


54 


ELEMENTARY STATISTICAL BIETHODS 


different industries, the figures there are approximate 
only, to the nearest thousand, and this approximation is 
indicated by the presence in each of three zeros. 

But sometimes the degree of accuracy in the result 
of calculating ratios is not merely determined by the 
context to which this result relates, but by the degree of 
approximation involved in the figures which form the 
numerator and denominator of the ratio. If these two 
figures are only accurate to a certain degree of approxima- 
tion, the resulting ratio is certainly only approximately 
correct within certain limits. Thus, if numerator and 
denominator are both 2 (one significant figure), the ratio 
obtained (i) is not correct to this figure, because the 
numerator and denominator may be any numbers between, 
roughly, 1-5 and 2 5, unless we have other information 
giving these figures to two significant figures, and the 


^*5 2 * 5 # 

ratio will be somewhere between — and — i.e. between 


2*5 


1*5 


0-6 and 17. Thus to one significant figure the ratio is 
either i or 2. In the same way if we are obtaining the 


ratio of — where numerator and denominator are given 
2-5 

to two significant figures, the result 1*36 must be con- 
sidered together with the outside limits which this ratio 
might achieve, if 3 4 stands for any number between 3*35, 
and 3*45, and 2-5 stands for any number between 2-45 and 

2 55. These limits are and i.e. 1-314 and 1-408, 

1*36 being nearly half way between them. To say that 
the result is 1-36 is certainly wrong because this implies 
precision in the result of a degree lacking in the original 
figures, the result might be 1-38 for instance. To say 


55 


SECONDARY OR DERIVED STATISTICS 

that the result is 1-4 is wrong, because the result might 
be 1*33 which is expressed as 1*3 to two significant figures. 
All we can do is to say that the result is i, or we can say 
^ .05, in this way indicating the limits to which 
the ratio might reach when the result is indicated to two 
places of decimals, or we might say 1*36 \s'ith a possible 

error either way of 3^ per cent. 

Two important matters emerge from this discussion 
of the accuracy of ratios. In the first place, if we are 
content \\ith percentages, averages, rates, and such like 
containing only a small number of significant figures, 
and in many pieces of statistical work these are quite 
satisfactory for the purpose, then we do not need to insist 
on absolute accuracy in the numerator and denominator. 
For instance, in a particular quarter in the Coal Mines 
of Great Britain 61,833,281 tons of coal are raised by men 
working 58,218,785 man-shifts in that p)eriod and the 
average output per man-shift is given as 21*24 cwts. 
We ran equally well obtain this figure from the information 
that 61,833,000 tons were raised, the number of man- 
shifts being 58,219,000. Now, the number of decimal 
places in this result (21*24) is quite enough for practical 
purposes, therefore from the practical statistical point 
of view we are content to know the output and the number 
of man-shifts to the nearest thousand. This is important 
because we may well imagine that the original figures of 
this kind may be subject to slight errors in counting, 
perhaps a ton of coal has escaped attention, and from 
the practical point of view when making comparisons 
between these large numbers we do not mind if this kind 
of error has entered so long as it does not amount to much. 
Consequently, whereas in accounting every unit figure has 


56 ELEMENTARY STATISTICAL METHODS 

a place because items and totals must check, in statistics, 
where the figures are used for purposes of comparison 
and where the comparisons are effected by making ratios, 
since absolute accuracy is not essential in these, round 
numbers or approximations are good enough in the 
original figures. This does not mean that we should not 
try for accuracy in the original data ; not at all accuracy 
is necessary, but we need not quote these numbers to 
all the significant figures which are involved in the originals, 
the arithmetical labour is thereby reduced. 

In the second place, on many occasions we find that 
difficulties are involved in obtaining absolutely accurate 
figures in the first instance. For example, errors are 
likely to enter into figures of the Import and Export trade ; 
we cannot be absolutely certain that ull the persons in 
the country have been enumerated in the Census, or that 
some person has not been counted twice; we cannot 
absolutely rely upon every householder giving correctly 
the number of living rooms in his house, thus the total 
number of rooms may be wrong ; and similarly in any 
investigation difficulties arise which make certain of the 
figures in a published table suspect. Moreover there 
are cases where the information used is not obtained 
directly as the result of an investigation ad hoc. but in- 
directly through some other means. For instance, the 
cost and trouble of a Census of Population prohibits its 
being taken at veiy frequent intervals, but for many 
reasons it is useful to know what the populaUon is sub- 
sequent to the last Census, and it is interesting to fore- 
cast the future population of a country. Consequently, 
estimates of the population are made each year, and these, 
being estimates, do not pretend to accuracy, they are 


57 


SECONDARY OR DERIVED STATISTICS 

given to the nearest thousand. Similarly estimates are 
made each year by the Ministry of Labour of the numbers 
of workers insured under the Unemployment Insurances 
Acts, these estimates being used in the Calculations of 
the Unemployment Percentages issued monthly. These 
estimates are to the nearest ten, they do not pretend 
to accuracy. Similarly, the Board of Trade publishes 
annual estimates of the imports and exports of 
services (as contrasted with goods) which enter into the 
Balance of Trade. Now in all estimates of this kind it 
is no use pretending to be accurate, consequently any 
ratios calculated with such a figure as a member cannot 
pretend to accuracy beyond a certain number of significant 
figures, limited by the number of significant figures in the 
estimate. The same applies, if it is felt that although 
accuracy has been urged in an investigation, this has not 
resulted. It is therefore fortunate that, in practice, we 
are content with our resultant ratios to a small number of 
significant figures. 

Round numbers or approximations are therefore used 
extensively in statistical work, especially when large 
numbers are involved, and these limit the amount of 
arithmetic involved in calculations and yet give us results 
which are exact enough for practical purposes. But, when 
round numbers are used, care must be taken that the 
resulting ratios are not worked out to a greater degree 
of apparent accuracy than the original figures warrant. 

Note : — When logarithms are used in the calculation 
of ratios, 4-figure logarithms give results correct to 3 
significant figures, 5-figure logarithms give results correct 
to 4 significant figures. 


58 ELEMENTARY STATISTICAL METHODS 

APPENDIX 

Tests of Random Sampling 

In Chapter 2 we mentioned the possibility of testing 
whether a sample was representative of the group from 
which the sample was obtained. 

Each unit in the whole group possesses a number of 
characteristics (say) A, B. C, . . . Information respecting 
a number of these (say) A, C. P, Q. R, is obtained from 
those units coming within the scope of the sample. Now 
some information of this kind is probably available con- 
cerning the whole group from which the sample is taken. 
For instance we may know all about characters A and C. 
If this is the case, we can compare the sample with the 
whole group as far as these two are concerned. This 
would be in the nature of a comparison of averages or 
ratios. We would compare (say) the proportion in the 
group possessing character A with that in the sample 
with this character. Or we might find the average amount 
of C in the whole group and compare it with the average 
amount of C in the sample. If the sample is representative 
then these proportions or averages should be the same or 
nearly the same within reasonable limits which can he 
calculated. (The theoretical considerations involved in 
the determination of these limits are too compUcated to 
be dealt with here.) 

This testing of the sample should always ^ possible, 
because, if we know sufficient to be able to identify the 
units in the whole group, we are likely to have some 
knowledge of certain characters possessed by this ^oup^ 
We can always arrange that, when sampling, we should 
obtain the information respecting these characters in 


SECONDARY OR DERIVED STATISTICS 59 

addition to the other information we want from the 
units in the sample. 

For instance, suppose we are taking a sample of house- 
holds in a certain town. In the sample there will be a 
certain proportion of school children. Now the Local 
Authority possesses information respecting the total 
number of these in the town, and the proportion of school 
children to total population can be calculated. The 
proportion in the sample should agree within certain 
limits with this figure. Or supp)ose we are ta k i n g a sample 
of the Insured Workers, there should be in the sample 
the same or nearly the same proportions of j>ersons in 
different Industries, Coal Mining, Building, etc., as there 
are in the whole group of Insured Workers. 

(See Livelihood and Poverty and Journal of the Royal 
Statistical Society, 1924, p. 544 ; 1928, p. 519.) 



Chapter 5 

COMPARISON OF AVERAGES 


The ratios which are calculated with the idea of 
effectively comparing one figure with another, and which 
are called percentages, averages, and so on are themselves 
subject to comparisons later. In all the illustrations 
given, such comparisons of secondary or derived statistics 
are possible. For instance we compare death rates in 
one towm with those in another, or we compare death 
rates over a period of time. We may compare the import 
of wheat per head of population with the import of beef, 
or the import of wheat per head over a penod of time. 
It is important to realize that we are concerned not only 
with the secondary statistics, but with the primary 
statistics also. This fact is often forgotten. We must 
remember that (Ratio)i compared with (Ratio), means 

(Numerator)i ^^h . Bycon- 

companng (Denominator), 

centrating on the ratios we are apt to forget that they 
were derived from numerators and denommators, and that 
these refer to groups of units. Now ratios may change 
for a variety of reasons and one of them is, that the ^oups 
themselves may change in 

we may take an Ulustration from the Coal Industiy^ One 

of the " efficiency indicators ” in this Industry is Output 

per Unit of Labour " or average "“I 

Lked. Now this average may mcr^ 

of the industry from one year to another. 

might be due to more efficient use of man-power, say, 


60 


COMPARISON OF AVERAGES 6i 

the increased use of machines for coal-getting. But it 
may also be due to a change in the number of mines 
working. Suppose, at the later date the relatively less 
efficient units of production are no longer operating, but 
that in any mine which is in operation at both periods 
the same output per manshift is obtained, then the man- 
shifts worked which form (Denominator) g will be man- 
shifts in the relatively more efficient mines, and the second 
ratio will be greater than the first. Thus, the change in 
the ratio may merely be indicative of a difference in the 
constitution of the group as between the two periods 
under review. As another illustration, consider a concern 
in the catering trade which sells meals, chocolates, tobacco, 
wines. A useful secondary statistic to calculate from time 
to time is the ratio of money received by sale of its goods, 
to the money expended on purchases of the raw materials 
at wholesale prices. Such a ratio must be high to allow 
for salaries, wages, heating, lighting, depreciation, and 
so on. Now the margin of profit which is possible will 
vary as between different classes of goods sold, the highest 
will be on those articles of food which submit to cooking 
operations and other services, and the lowest on (say) 
tobacco, the retail price of which is determined by outside 
competition and agreements with wholesalers from one 
time to another. Then the ratios worked out in the 
manner described might show changes which are due to 
changes in the volume of consumption of the different 
classes of goods, and may not be due to any change in 
efficiency of management, whereas it may be the original 
purpose of these ratios to reflect such changes as the 
latter. Let us illustrate this with a numerical example. 
Supjx)se at one time we have the following figures i — 


62 ELEMENTARY STATISTICAL METHODS 



Food 

I Chocolate j 

Tobacco j 

[ 

Wine 

Total 

Per 

cent 

Cost 

Sale 

£i.m 

£3.000 

i ^100 1 

1 1 

^200 
£230 i 

£700 

£1,260 

£2,000 

£4,615 

100 

232 


At a later time the following figures are obtained 



Food ! Chocolate 

i 

Tobacco 

Wine 

Total 

Per 

cent 

Cost 

Sale 

£1,000 £100 

£3,000 £125 

£300 

£345 

1 £600 
£1,080 

£2,000 

£4.550 

100 

227 


The change from 232 to 227 is due entirely to the figures 
making the totals, and the margins of profit obtained in 
the various trading departments are unchanged. Similarly, 
difficulties arise when comparing death rates of one 
community with those of another. It is weU known that 
old people are more likely to die in a given year than 
younger people, that very young infants are more likely 
to die than children who have survived the first year. 
The death rate of any community is partly determined 
by the age constitution of the group, and a difference 
between two death rates may be due in part or wholly 
to the different proportions in different age-groups in 
the communities, the death rates of which are under 
discussion. Similarly the average consumption of tobacco 
per head of population may change from one period to 
another partly because the proportion of the population 
who are smokers may have changed considerably. 

The point at issue is one of interpretation of the 
calculated ratios. The fact that a ratio has chtuiged 
mav be known. What does this mean ? Is this change 


63 


COMPARISON OF AVERAGES 

due to a corresponding change in the individuaJ unit’s 
possession of a particular character, or is the change due 
entirely or partly to some change in the constitution of 
the group formed by the individual units ? The tendency, 
in general, is to attribute the change to the first cause, 
and it is only on further examination of the statistics 
that it may be found that the second cause is also con- 
tributory. This difficulty is met, in practice, by splitting 
up the whole group into constituent parts when this is 
possible, and where it is felt to be desirable. For each 
part, in which the units may be considered alike in respect 
of the character or characters under discussion, separate 
ratios are calculated and comparisons are made between 
these ratios, subgroup by subgroup. Thus, if we wished 
to compare death-rates in one community with those in 
another, with the idea of finding out the effect of funda- 
mental racial or environmental conditions on this question 
of the likelihood of dying within a year, we should compare 
the death-rates of those in specified age-groups, and so 
evade the complication due to the fact that there may 
be in the different populations different proportions at 
various ages. Further we might consider the question 
of regrouping into occupations, because different 
occupations may have different death-rates, and a 
different occupation-constitution of the population may 
be a contributory cause to a change in the death rate. 
Similar remarks apply to marriage-rates. The fact of 
the marriage-rate changing may be due to change in 
custom, or it may be due entirely to a change in the age- 
constitution of that part of the population eligible for 
matrimony. Certainly before we can assert that such 
a change is due to the first cause, the ground has to be 


64 ELEMENTARY STATISTICAL METHODS 

examined to find out whether the second cause is not 
the main reason for the change observed. 

We, therefore, split up the original group into sub- 
sidiary groups which contain units homogeneous or alike 
in respect of some character possessed by the individuals 
in different degrees, this character being suspected or 
known to be connected in some way with other characters, 
which are the subject of discussion by the consideration 
of changes in ratios. Thus, if we really want to find 
out about changes in custom in a commimity in res^t 
of consumption of tobacco, for instance, we should consider 
the problem from two angles, first, dealing with the changes 
in the proportion of smokers in the population over a 
series of years, secondly, dealing with the average annual 
consumption per head of that part of the population 
consisting of smokers, instead of considering the crude 
figures obtained as average consumption per head of the 
population. Unfortunately, of course, the number of 
smokers in the population is not known. SimUarly, if 
we wish to analyse changes taking place over a i^nod 
of time in the birth-rate, we relate the number of legitimate 
births year by year to the number of married women 
(ages 15-45), and get the average number of births ^r 
100 married women of childbearing age. So with 
marriages, we get a better figure than the crude marriage 
rate, if we relate the number of men (say) marrying in 
a year to the number of bachelors and widowers of 
marriageable age available in that year. We try to get 
the numerator and denominator of our ratio so related, 
that the result wiU give correct information about the 
question under consideration, and will not be mfluenced 
bv extraneous factors. As a further illustration the 


COMPARISON OF AVERAGES 


65 


table below is given. The Monthly Railway Statistics 
issued by the Ministry of Transport include tables from 
which these figures were extracted : — 

G. B. Railways 


Average Wagon Load (tons). July, 1929 and 1990 


Railway Company 
and Area. 

Class of Freight 

I 

Merchandise 

and 

Livestock 
1929 I 1930 

i II 

Minerals 
j and 

* Merchandise 
[ 1929 1 1930 

i I 

• Coal. 
1 a 

1 Paten 
1 1929 

II 

Coke, 

nd 

it Fuel 
1930 

Ail Freight 

1929 1 1930 

Great Western 

^ 2'92 

( 2*91 

9*18 

9*10 

: 9*97 

1CH>4 

5-84 

5-66 

WesUm 

2-56 

1 2*59 

I 8*47 

8-93 

9-47 

9*62 

4*58 

4*62 

Midland 

; 2-58 

2*^ 

1 9*(r2 

9*46 

; 9-21 

9-40 

4*82 

4-83 

South WaUs 

. 4-48 

4*25 

' 10-19 

i 8*73 

• 1045 

104S 

8*40 

7*90 

London & Xorth Eastern 

2*83 

2*75 

? 9*51 

9*17 

10*01 

9-96 

5-81 

5-68 

Soutlum {EasUm) 

2*00 

265 

! 8*79 

9-01 

i 9*18 

9*12 

5*10 

5*20 

Southern {WesUm} 

2-74 

2*68 

9-35 

9*42 

! 9*44 

9*53 

5-98 

5-97 

Sorth Eastern . j 

2‘98 

2*88 

10*49 

10*31 

' 12*48 

12*47 

6*23 

5*84 

Soutkem Scottish . ’ 

3*03 

2*87 

8*49 

8*53 

; 8*90 

8-89 

5*22 

5*03 

Northern Scottish , > 

2*68 

2*55 

6*92 

7*37 

! 8-68 

1 

8*75 

4*14 

4*14 

London, Mid. & Scottish | 

2*82 

2*80 

8*87 

9-04 j 

1 8*77 

8-82 

5*24 

5-15 

Western 

2-&4 

2*95 

8*82 

9*04 

8-92 

000 

5-02 

4*96 

Central 

2-6G 

2*^ 

8*08 

8*39 1 

' 9-19 

9*32 

5*31 

5*44 

Midland 

2*53 

2*58 

9*18 

9*27 i 

! 8*61 

8*^ 

5-50 

5-47 

Northern (South) 

3*52 

3*28 

8-62 

8*69 I 

8*79 

8*71 

5*26 

487 

Northern (North) . | 

2*38 

2*21 

7*^ 

7*87 1 

8*76 

8*72 

4*04 

3‘8l 

Southern 

2*69 

2*73 

8*43 

8*39 ; 

9*47 

9*47 

4*92 

4*88 

Cheshire lines Committee 

3*06 

2*96 

9*29 

9*40 i 

9*17 

9-26 

5*39 

5*31 

Metropolitan 

2*77 

2*79 

8*79 

9-01 

9*04 

9-30 

5*23 

5*41 

Midla^ A GJ'f. Joint . 

2*09 

2-20 

8-86 

9-23 } 

8*83 

9*02 

4*46 

4*76 

Great Britain 

2-83 1 

2*80 

9-10 

9-15 : 

9*48 

f>-49 

5-51 

5-41 


The average figure worked out for the whole of Great 
Britain has changed from 5-51 tons (July, 1929) to 5*41 
tons (July, 1930) • But it is reasonable to suppose that 
this figure will vary from district to district owing to the 
characteristic distribution of industry throughout the 
country, and in fact we find the figure highest in South 
Wales, and lowest in Northern Scotland. Moreover it 
would be anticipated that this figure would be influenced 
by changes in volume of different kinds of freight traffic. 


66 ELEMENTARY STATISTICAL METHODS 

We find in fact that, when freight trafSc is divided into 
the three main classes into which the railway companies 
group their traffic, the figures are fundamentally diSer^t 
from one another. The load in the case of minerals and 
coal is much greater than in the case of general 
merchandise. It is therefore preferable, if a comparison 
is to be made between operations in these two periods, 
that the average wagon load should be separately 
calculated, as in the table, for different areas and different 
classes of merchandise. A comparison of these figures 
enables us to decide whether any change has taken place 
in loading of wagons, which would not be obtained from 
the crude figures obtained with respect to the whole 
operations. Let us look at the figures relating to the 
operations on the London Midland and Scottish Railway 
(Western Section) reproduced here ; — 


I 

n 

m 

Aix Freight 

1929 

1930 

1929 

1930 

1929 

1930 

1929 

1930 

2'94 

2-95 

8*82 

9*04 

8*92 

9*00 

5*02 

4*96 


The average figure for all freight traffic has declined, 
we might conclude that the loading of wagons has not 
been so efficiently performed in July, 1930, as in July, 
1929, but the individual averages for the different kinds 
of freight all show higher results. The average for aU 
freight is less than before purely and simply because 
there has been a change in the relative volume of goods 

of the different classes carried. 

In these Railway Statistics we subdivide the whole 
group into a number of sections and obtain ratios foi 


07 


COMPARISON OF AVERAGES 

each, implementing the principle laid down in this dis- 
cussion of obtaining a ratio from a numerator and 
denominator which relate to a homogeneous poup. In 
cases where this cannot be done, for instance in the case 
of the average consumption of tobacco per head of 
population, where we do not know the proportion of the 
population which consists of smokers, we have to be 
content with the crude ratio, which is open to the objection 
that the apparent reason why the ratio changes may not 
be the real reason, so that any conclusions drawn must 
be merely suggestive and not definite. 

In many cases, where the sort of difficulty discussed in 
the preceding pages arises, it is possible to overcome it 
by a simple device which renders possible a comparison 
between two groups by means of a single ratio, instead 
of by a number of ratios. Suppose we consider for 
illustrative purposes those figures in the table above 
already discussed relating to the London Midland and 
Scottish Railway (Western Section) in July, 1929, and 
July, 1930. If we examine the sources from which these 
figures were obtained we have the following information : — 


Class of Freight 


JutY, 

I 


II 

III 

Aix Freight 

1929, 1930 









1929 1 

1930 

1929 

1930 

1929 

1930 

1929 

1930 

Net Ton Miles 









(mn) 

Loaded Wagon 

82 

80 

67 

59 

66 

62 

215 

201 

Miles (mn) 

27-8 

27-1 

7-6 

6-5 

7-4 

6-9 

42-8 

40-5 

Average Wagon 









Load (tons) 

2-94 

2-95 

8-82 

9 04 ‘ 

8-92 

900 

5-02 

4-96 


{Average Wagon Load = Net Ton Miles -r Loaded Wagon Miles.) 
Note : — Only 2 or 3 significant figures are given in the above table. 


68 


ELEMENTARY STATISTICAL METHODS 


This table shows why the averse wagon load for all 
freight is less in July, 1930, than in July, 1929, although 
the averages for Classes I, II, III have increased in this 
year’s interval. There has been a heavy reduction 
relatively in Class II and Class III compared with Class I 
in wagon miles and ton miles, consequently Class I becomes 
relatively more important in the total at the later date, 
and since the average wagon load in this class is small 
compared with the other average figures, the general 
average is depressed. But suppose we relate the average 
wagon loads of these classes to some constant standard 
distribution of ton miles or wagon miles between these 
three classes, we can, by referring to this hypothetical 
distribution, find averages which will only reflect in their 
changes, those changes due to loads. Let us take the 
distribution of load«i wagon miles for the whole year 
1929 as standard ; in this year the distribution between 
the three classes was : — 


Yeak 1929. Loaded Wagon Miles (km) 


Class of Freight 


I 

II 

III 

All Fkeight 

325 

82 1 

97 

504 


Consider the following Scheme : 


Class 

(1) 

Loaded 

Wagon 

Miles. 

1929 

(mn) 

(2) 

Average 
Wagon 
Lo^, 
July. 1929 
(tons) 

(3) 

Tons Miles 
(1) X (2) 
(mn) 

(4) 

Average 
Wagon 
Lo^. 
July. 1930 
(tons) 

(S) 

Ton Miles 
(1) X (4) 
(mn) 

I 

325 

2-94 

955-4 

2-95 

958-8 

TT 

82 

8-82 

723-2 

9-04 

741-3 

XX 

III 

97 

8-92 

865-2 

9-00 

873-0 

All 

504 


2543-8 


2573-1 


Standardized 
Average Wagon Load 
(All Freight) 


July, 1929 
S = 5 05 tons 


504 


2573 1 
504 


= 5*11 tons 


COMPARISON OF AVER.\GES ^ 

We calculate what the ton miles would be for each class 
of freight, given the average wagon loads of July, 1929. 
and the loaded wagon miles of the standard year 1929. 
From that total we get the average wagon load, 5-05 tons , 
which we may call the standardized average wagon load 
for July, 1929. Similarly we get the standardized average 
wagon load for July. 1930. We can make these com- 
parisons : — 

Average Wagon Load (Tons) July. 1929 and 1930 



July, 1929 

July, 1930 

Crude . 

5 02 

4-96 

Standardized . 

505 

5" 11 


When comparing 5-02 tons with 4-96 tons, the com- 
pariscn is vitiated because we have to take into account 
the change in volume of traffic between the classes of 
freight, but 5*05 tons compared with 5-11 tons gives us 
a direct comparison, independent of this change. We 
are enabled, in this way by standardizing the distribution 
of traffic, to get averages for all freight which will reflect 
changes due to one cause alone. Of course, the actual 
standardized averages calculated will depend partly on 
the period chosen as standard, but the comparison will 
not be vitiated on this account, and the comparison is 
the important point to be considered. For instance we 
might take the distribution in loaded wagon miles in the 
year 1930 as standard : — 


Year 1930. Loaded Wagon Miles (mn) 
Class of Freight 


I 

II 

III 

All Freight 

315 

75 

92 

482 


70 


ELEMENTARY STATISTICAL METHODS 


Consider the following Scheme : — 


Class 

(1) 

Loaded 

Wagon 

Miles 

1930 

(mn) 

(2) 

Average 
Wagon 
Load, 
July, 1929 
(tons) 

(3) 

Ton jVIiles 
(1) X (2) 
(mn) 

(4) 

Average 
Wagon 
Load, 
July, 1930 
(tons) 

(5) 

Ton Allies 
(1) X (4) 
(mn) 

I i 

315 

2*94 

926*0 

2*95 

929*2 

II ' 

75 

8*82 

661-5 

9*04 

679*0 

III 

92 

8*92 

820*8 

9*00 

828-0 

All 

482 

1 

2408*3 


2436*2 


Standardized 
Average Wagon Load 
(AU Freight) 


July, 1929 

2408*3 ^ ^ 

=5*00 tons 


July, 1930 
2436*2 

== 5*05 tons 


The standardized averages are different from those 
calculated with the 1929 figures as standard, which were 
5 05 tons, 511 tons ; but the comparison between July, 

1929, and July, 1930, stiU indicates a rise from 1929 to 

1930, and that is the important point, because the crude 
average for all freight indicated a drop. Let us consider 
a further illustration of this method of obtaining 
standardized averages ; this time taken from results of 
working in the Coal Mining Industry. The following 
figures refer to the six main coal producing districts in 
Great Britain, and the totals refer to operations in these 
six districts as a whole. 


Co AT Mining Industry : Output per MANsrarx (cwr.) 



I 

II 

III 

IV 

V 

VI 

Total 


! ScotLaod 

Northumber- 

land 

Durham 

South 

Wales 

Mid- 

lands 

Lanca- 

shire 

1924. Ist 
quarter 

18-96 

1712 

1715 

16*19 

20-60 

14-88 

17*94 

1925, 1st 
quarter 

1 1901 

1806 

17-91 

16-36 

20*30 

14-79 

18*12 



7 * 


COMPARISON OF AVERAGES 

The output per roan-shift varies from district to district. 
The average for all these districts in the first quarter 
of 1024 was 17-94 cwt., and in the first quarter of 1925 
was i 8 ’I 2 cwt. Increases in districts were recor e 
between these dates in four cases (I, 11 , HI, IV} and 
decreases in the other two (V, VI). But between these 
two dates there was a change in the distribution of work 
done as the table below shows 


Man-shifts Worked (thousands) 




I 

II 

III 

IV 

V 

VI 

Total 

1924, 1st 
quarter 

Man-shifts 
Per cent 

9,488 

13-2 

4,M0 

5-6 

11,264 

15-7 

15,377 

21-4 

22.081 

30-8 

9.491 

13-2 

71,741 

100 

1925, 1 st 
quarter 

Man-shifts 
Per cent 

8,495 

12*8 

3.366 

51 

9,416 

14-2 

14,025 

21-2 

22.213 

33-6 

8.632 

131 

06.147 

100 


In all these districts taken together there was a decline 
in the amount of work done, but in the case of the fifth 
district there was an increase. The percentage figures 
indicate the changes which have taken place district by 
district between the two periods. Now, since the output 
per man-shift is at different levels in different districts, 
the average for the whole is bound to be affected by the 
proportions of work done in these districts. Consequently 
any change in these proportions from one time to another 
will have an affect on the average figures, and therefore 
these average figures do not truly reflect changes due 
merely to organization of the industry. If we want to 
find out exactly a measure of such changes we certainly 
must eliminate the effect of changing distribution of work 
done and output between districts. This can be done by 
obtaining standardized averages, by referring the output 
per man-shift figures for each district to a standard 
distribution of labour, which will not be changed with 


72 ELEMENTARY STATISTICAL METHODS 

time. Let us take as standard the distribution of labour 
between districts which obtained during the whole year 
1924 ; this is given by the table : — 


Yeas 1924. Man-shifts Worked (thousands) 


I 

n 

III 

rv 

V 

VI 

Total 

36,010 

15,045 

42,145 

58,149 

84,121 

35.637 

271,107 


Make the calculations indicated in the table below : — 


District 

(1) 

Man-shifts 

in 

Standard 

period 

(thousands) 

(2) 

Output per 
Man-shift 
1924, 1st 
quarto’ 
(cwt) 

(3)=(1)X(2) 

Ou^ut 
mn. cwt 

( 4 ) 

Ou^utper 
Man-shift 
1925, 1st 
quarto 
(cwt) 

(5)=(1)x(4) 

Output 
mn. cwt 

I 

36,010 

18*95 

682*5 

19*01 

684*6 

n 

15,045 

17*12 

257*6 

18*05 

271*5 

III 

42,145 

17*15 

722*3 

17*91 

754*8 

rv 

58,149 

16*19 

941*1 

16*36 

951*2 

V 

84,121 

20*60 

1732*8 

20*30 

1707*8 

VI 

35,637 

14*88 

530*1 

14*79 

527*0 


271,107 


4866*4 


4896*6 


standardized . , 

Average Ontpat 271*107 271*107 

per man-shift _ 17.95 cwt. = 18*06 cwt 


Column (i) gives the standard distribution of work 
done, column (2) gives the district results in 1924, ist 
quarter, column (3) gives the output which would be 
obtained if the amounts of work done ware those in 
column (i), and the results of working were those in 
column (2). The total of colunm (3) would then be the 
total output produced by the total man-shifts given as 


73 


COMPARISON OF AVERAGES 

the sum of the figures in column (i), and the relationship 
of these is the output per man-shift for the^e distncts 
as a whole, 17*95 cwt. A similar calculation on the 
working results in 1925, ist quarter, shown m column (4), 
combined with the standard distribution of work done in 
column (i) gives the figures in column (5) showing the 
output which would be obtained under those circum- 
stances. The average for the districts as a whole is 
i8*o6cwt. per man-shift. We may compare the crude 
averages with these standardized figures . 



1924, 

1st quarter 

1925, 

1st quarter 

Crude 

17*94 cwt. 

18*12 cwt. 

Standardized . 

17*95 

18*06 


These figures show that, on the whole, output per man- 
shift had increased between the two dates from 17*95 
i8-o6cwt. due to causes other than changes in the dis- 
tribution of work done between the districts, an increase 
of 0*11 cwt., whereas the whole change in output from 17*94 
to i8*i2 cwt., i.e. an increase of o-i8 cwt., was partly due 
to the change in the work done in di^erent districts. We 
could with reason say that 0-07 cwt. of this change is 
due to this last cause alone. 

This method of obtaining standardized ratios or 
averages which will serve for comparative purposes 
instead of the origmal or crude ratios or averages, 
enables us to separate the effects of different causes. This 
method is used when different areas are to be compared 
as to death-rates. As various areas have different age- 
distributions of the population, some places having perhaps 


74 ELEMENTARY STATISTICAL METHODS 

relatively more old persons or more infante, and as the 
death-rate changes with age, the difEerences in age- 
distribution from district to district themselves would 
cause differences between the death-rates, irrespective 
of whether there were differences between the death-rates 
in the same age-group from district to district. The 
effect of this cause on the death-rate is therefore removed 
by referring the death-rates of age-groups in each district 
to a standardized population age-distribution, and the 
standardized death-rates are worked out in the same way 
as has been indicated in the illustration above. 

It is perhaps useful at this stage to interpolate some 
remarks on the place of ratios and averages, of the kind 
we have been discussing, in statistics. Statistics has been 
called the science of averages, and certainly these play a 
large part in any discussion of statistical results, because, 
as we have seen, they serve, in many cases, as the basis 
of the first simple kind of comparisons which can be 
made between different statistics. Especially is this the 
case when we are dealing with large numbers of unite in 
our groups. But their importance must not be 
exaggerated. They may be considered as serving as 
rough guides to the information contained in the original 
bgures from which they have been derived. Those 
interested in the efl&cient working of a huge organization 
like the railw^ays look to such ratios and averages to 
serve as pointers of progress towards a greater efficient 
utilization of their resources, and if these figures, of which 
a great many are calculated, indicate changes which 
appear to be taking place and which are considered 
undesirable, the source of this change is sought in the 
original figures; a change in one of these ratios or 


75 


COMPARISON OF AVERAGES 

averages suggests a prima facie case for investigation, 
it does not necessarily mean that the change noticed in 
the ratio is connected with some loss of efficiency, there 
may be another and acceptable explanation. Similarly 
in the case of the Coal Mining Industry, the output per 
man-shift calculated at frequent intervals serves as an 
indication of progress in the industry. It is not supposed 
that every man-shift worked produces so much coal, 
there are many men employed in coal mines who are not 
working at the coal face, and if we wanted an indication 
of the work done by hewers we should work out another 
ratio altogether different, output per man-shift (hewers). 
But this average value will prove to be a guide to those 
interested in efficient working in this industry, and if 
any change occurs which suggests that coal is being 
produced at too expensive a rate of man-power, naturally 
investigation wiU be made to find out what changes in 
methods of production have caused this change in the 
ratio. In statistics relating to overcrowding in houses, 
an average of 2-33 persons per room in a household should 
not be considered in relation to any particular household ; 
one should not quibble about its being impossible to 
consider 0-33 of a person,^ but one should rather consider 
this average figure as giving a rough indication of 
circumstances relating to a whole group of households, 
and if such a figure is referred to a similar figure obtained 
from another group, say 1-24 persons per household, the 
difference between the sizes of these figures, which are 
representatives of each group, should call for comment 
and investigation into the different set of circumstances 

^ If we do not like to speak of 2-33 persons per household we can 
always t h i nk of 233 persons per hundred households, which will serve 
equally well. 



76 ELEMENTARY STATISTICAL METHODS 

which give rise to these average results. These ratios or 
averages, which have been the subject of discussion in the 
previous pages, must be considered as broadly indicating the 
possession of a particular characteristic by a group of 
units, or the relation between two characteristics possessed 
by the group ; they are not to be considered as having 
particular reference to any one individual of the group. 
Moreover, they are not to be considered as replacing the 
original information which is known relating to the 
group ; it is possible to obtain more knowledge of a group 
than is indicated by such ratios or averages, by going 
back to the tables from w^hich these are calculated. 

We may consider that so far we have progressed by 
three stages. In the first we have the full detailed 
information which results when the inquiry is instituted ; 
in the ne.xt stage, tabulation, a good deal of this detail 
is lost, but our information is available in a more concise 
form ; in the next stage, when ratios and averages are 
obtained, still further detail is lost, and the information 
is available in a very simple form indeed. Or we may 
consider an analogy of this kind, the whole body of 
available information may be likened to a human body, 
the ratios and averages are the skeleton of this body, 
or they may be likened to the shadow of the body thrown 
by a light on a plane surface. 


Chapter 6 


THE CALCULATION OF AVERAGES 

It sometimes happens that the figures in the original 
tables do not give us the information in such a form 
that ratios which are required are immediately calculable. 
Some intermediate arithmetical processes are necessary. 
For instance, consider the table below, which gives 
information relating to the number of births which have 
occurred in the first 14* years of marriages, where husband 
and wife survived that length of time from the date of 
marriage, which were recorded in " Whitney, the 
descendents of John Whitney, who came from London, 
England, to Watertown, Massachusetts, in 1635 
(F. C. Pierce, 1895.) (This table is taken from 
Biometrika, XV, Parts 3 and 4, p. 415.) 


Frequency of Occurrence of Births in 
( a) Marriages before 1820 ; (6) Marriages 1840-1859 



(a) Marriages before 

1820 

{b) Marriages 
1840-1859 

Births 

(1) 

Number of 
Marriages 

(2) 

Number of 
Births 

(3) 

Number of 
Marriages 

(4) 

Number of 
Births 

0 

7 

0 

42 

0 

1 

3 

3 

37 

37 

2 

9 

18 

87 

174 

3 

17 

51 

92 

276 

4 

29 

116 

52 

208 

5 

' 42 

210 

55 

275 

6 

58 

348 

41 

246 

7 

48 

336 

1 3 

63 

8 

15 

120 

3 

! 24 

9 

4 

36 

— 


10 

1 

10 


j 

Total 

233 

1248 

418 

1303 


77 


78 


ELEMENTARY STATISTICAL METHODS 


In this table column (2) is obtained from column (i) 
by multipl3dng the figures in this column by the 
appropriate number of births. Similarly the figures in 
column (4) are obtained from those in column (3). 

The appropriate ratios, the average number of births 
1248 1303 

jjer marrkge, are — 5*36, = 3*12. In this table 

in its original form, where only columns (i) and (3) are 
shown, the total number of births, which is the numerator 
of the ratio, is not given, though the number of marriages 
is stated. The arithmetical calculations shown above 
have to be performed before we can get the ratio required. 

Again consider such a table as t^ below, taken from 
1921 Census Report. 


DisTRiBcnoN OF PwvATB Famlies IN Hunsworth U.D. (Yorkshirb) 
ACCORDING TO SiZE OF FaNILY AND NdMBES OF ROOMS 


Number of 
Persons 
in Family 

M 

2 

"1 

Numl 

4 1 5 

>er c 

6 

if Ri 

7 

oom 

8 

iS 

9 

10 

11 

12 

Number 

Families 

1 

2 

9 

4 

4 





.» 

— 

— 

- 

19 

2 

4 

25 

20 

16 

4 

3 

1 

— 

- 

- 

- 

— 

73 

3 


24 

34 

30 

11 

5 

1 

1 

- 

- 

- 

— 

106 

4 

1 

14 

24 

19 

9 

6 

1 

1 

1 

- 

- 

— 

76 

5 


7 

16 

7 

9 

3 

2 





1 

44 

6 


4 

5 

7 

5 

1 

— 

1 

- 

- 


24 

7 


1 

3 

5 

2 

1 







12 

8 


1 

3 










4 

9 


- 

1 

- 

1 

• 

- 

1 

— 

— 

— 

• 

3 

1 

10 

— 

— 

1 

- 

- 


— ' 

— 





i 

1 

11 

- 

- 

— 

1 

1 








1 

Number of 
Families 

7 

85 

Ill 

89 

41 

19 

5 

4 

1 

- 

- 

1 

363 


This table presents the information relating to the 
possessiOTi by the units of the group (private famflies) 



8o ELEMENTARY STATISTICAL METHODS 

of two characteristics, the number of persons in the famUy, 
and the niimber of rooms available to the family, and at 
the same time it gives us the various ways in which one 
character is possessed by those units of the group which 
are alike in respect of their possession of the other 
character. Thus, we have the separate distributions of 
rooms in the cases of those families with i, 2, 3, . , . 
persons in the family, and the separate distributions of 
ptersons in the family in the case of those families with 
I, 2, 3, . . . rooms. There are therefore many interesting 
averages which may be calculated from the table on 
page 79. 

We may, for instance, obtain, as shown in the above 
calculations, the average number of persons per family 
for the whole group, and for those families in the group 
with the same number of rooms. These results may 
be set out in this fashion : — 



Whole 


Families with 



Group 

2 Rooms 

i 3 Rooms 

4 Rooms 

5 Rooms 

Average Persons 
per Family 

3-62 

3-07 

3-75 

3-53 

4*27 


From this we may see how the average number of 
persons per family (the average size of family) changes 
with the amoimt of accommodation (the number of 
rooms per family). Alternatively we may obtain the 
average nvunber of rooms per family for the whole group, 
and for those families in the group containing the same 
number of persons. The calculations to this end are 
shown in the table on page 81. 




8 with 
‘8008 

Number 

of 

Rooms 

1 M 1 1 i 

1 167 

- 3 60 

Familie 
5 Pei 

Number 

of 

Families ' 

1 1 1 1 1 1 

44 

167 

44 ' 

1 with 
sons 

Number 

of 

Rooms 

1 1 1 

s 

CS| 

CO 

Familiei 
4 Per 

Number 

of 

Families 

^252*® 1 I 1 

76 

282 

76 " 

9 with 
sons 

Number 

of 

Rooms 

ISgSSSS*"® MM 

1 370 

. 3-49 

Familie 
3 Per 

Number 

of 

Families 

1 

• MM 

106 

370 

106“ 

s with 
sons 

Number 

of 

Rooms 

M M 1 

1 223 

. 3 05 

Familie 
2 Pei 

Number 

of 

Families 

1 1 1 1 1 

73 

223 
73 " 

Group 

Number 

of 

Rooms 

r^cowjo^cooo 1 1 ^ 

-•co«csi^ 

1 1273 

. 3-51 

Whole 

Number 

of 

Families 


363 

1273 
363 “ 

5 2 ^ 

« fa 

§>2 ^ 

— w«’^irt<or^ooa>o — 2 gg ^'3 

(1 


82 


ELEMENTARY STATISTICAL METHODS 


These results may be shown in this way : — 



Whole 

Groap 

i 

2 Persons 

I 

Famili( 

3 Persons 

js with 

4 Persons 

5 Persons 

Average Rooms 
per Family 

3-51 

1 

3-05 

3-49 

3-71 

3-80 


This summary of the results enables us to see how, on 
the average, the number of rooms per family increases 
with the size of family. 

Moreover, it is possible also for us to construct from 
such a table as this, another table giving the distribution 
of these families according to the possession of another 
character " number of persons per room ”, obtained by 
dividing the number of persons by the number of rooms 
in the case of each family. If reference is made to the 
original table we can replace this by another (page 83), 
where the values of this third character are inserted. 


We get the following distribution according to number 
of persons per room. 



0-25- 

OK- 

086- 

M5- 

1*45- 

1-75- 

2-05- 

2-35- 

2«- 

2-95- 

3-25- 

3*55- 

3*85- 

Total 

Number 

of 

Famite 

51 

83 

91 

39 

47 

29 

3 

7 

4 

5 

2 

- 

2 

363 


We may, alternatively, represent the distribution of 
families according to number of persons per room in a 
slightly different way, thus — 


Nmnber 
of Persons per 
Room 

Less 

than 

0-5 

0-5- 

10 - 

1-5- 

2*0 

and 

over 

Total 

Nnmber of Families 

19 1 

1 115 

130 

S3 

46 

363 



1 1 

1 1 

1 1 

1 1 

1 1 

1 

OSO 

1 1 

1 1 

1 1 

1 1 

1 1 


1 1 

1 1 

1 1 

1 1 

1 1 

1 1 

1 1 

1 1 

1 1 

1 1 

1 1 

o 

1 1 

1 1 

1 1 

1 r 

f 1 

1 1 

1 1 

1 1 

1 1 

1 1 

1 1 

0) 

1 1 

1 1 

1 1 

Tf 

1 1 

1 1 

i 1 

1 1 

1 1 

1 1 

1 1 

00 

1 1 

1 1 


o 

to 

6^ 

1 1 

0-75 

1 1 

1 1 

1 1 

1 1 

1 1 

CO 

a 

8 

« _ 
*8 

t> 

1 1 

0-28 

1 

CO 

6^ 

tN 

to 

6^ 

0-71 

2 

1 1 

1 1 

1 1 


1 1 

1 1 

1 1 

0-33 

3 

0-50 

5 

0-67 

6 

0-83 

3 

!- 

9^ 1^ 

1 1 

1 1 

1 1 

1 1 

a 

p 

z •<> 

1 1 

0*40 

4 

II 

090 

6 

080 

100 

9 

1-20 

5 

1*40 

2 

1 1 

s 

1 1 

1 1 


0-25 

4 

OSO 

16 

0-75 

30 

o 

o 

to 

cs 

1-50 

7 

1-75 

5 

1 1 

1 I 

[ 1 

lO 

CO 

eeo 

0-67 

20 

1-00 

34 

1*33 

24 

91 

^9*1 

200 

5 

2-33 

3 

2-67 

3 

300 

1 1 

3-33 

1 

1 1 

w 

0*50 

9 

100 

25 

o 

to 

C9 

200 

14 

2-50 

7 

300 

4 

3-50 

1 

400 

1 

1 1 

1 1 

1 1 


100 

2 

200 

4 

1 1 

400 

1 

1 I 

1 1 

1 1 

I 1 

1 1 

1 1 

1 1 


Persons per Room 
Number of Families 

Persons per Room 
Number of Families 

Persons per Room 
Number of Families 

Persons per Room 
Number of Families 

Persons per Room 
Number of Families 

Persons per Room 
Number of Families 

Persons per Room 
Number of Families 

Persons per Room 
Number of Families 

Persons per Room 
Number of Families 

Persons per Room 
Number of Families 

Persons per Room 
Number of Families 

Number of 

Persons in 
Family 


09 

CO 


to 

CO 


00 


o 



84 ELEMENTARY STATISTICAL METHODS 

This table emphasizes the number of families where 
fairly large numbers of persons are living in houses with 
a comparatively small number of rooms. Thus there are 
46 families out of 363 where the number of persons in 
the family related to the number of rooms is 2 or more. 

Sometimes the information given in tabular form is 
such that it is impossible for us to determine the exact 
value of a ratio which we require, because we cannot get 
the numerator exactly, however many arithmetical 
processes we employ. Consider, for instance, such a 
table as this below, which gives the distribution of marks 
obtained by candidates in a certain examination, in a 
table such as is usually employed for this class of data. 


Marks 

35- 

40- 

4S- 

50- 

55- 

60- 

65- 

70- 

75- 

Total 

Number of 
Candidates 

1 

5 

12 

34 

23 

22 

23 

12 

1 

133 


In this kind of table, since the marks obtained by 
different candidates vary very widely, those are grouped 
together who get more or less the same marks, thus, there 
are 34 who get marks ranging from 50 to 54, there are 23 
who get marks ranging from 55 to 59, and so on. With 
the loss of the original detail, which accompanies the 
condensation of the data into tabular form, we now have 
the diffi culty that we caimot find the total marks obtained 
by all candidates, which should be the numerator of the 


Total Marks obtained ^ ^ 
' Total Candidates 


many purposes, it is essential that we should be able to 
obtain such an average from a table of this kind, 
consequently we have to overcome this difficulty somehow. 


CALCULATION OF AVERAGES 85 

and in practice we overcome it by means of a simple 
device which works well and which has a theoretical 
justification. We re(juire the total number of marks of 
the whole 133 candidates ; some of this total will be 
contributed by the 34 candidates, for instance, whose 
marks range from 50 to 54. Now we do not know exactly 
how much these contribute, but we know that this amount 
will be as much as, or more than 34 x 50 = 1700, and we 
know that it will be less than, or as much as 34 x 54 
= 1836, because if the marks in this small group range from 
50 to 54 they may be all at 50 or all at 54 or spread between 
50 and 54, So that although we do not know the amount 
contributed exactly we know that it is between these 
limits 1700 and 1836. If we assume that the actual 
marks of those in this group are spread between 50 and 
54 without being particularly concentrated at any 
particular number of marks in this range, then we shall 
not be far wrong if we assume that the average mark 
of those in this group is half-way between its limits 50 
and 54, i.e. at 52 marks. In a table of this kind, this is 
a fair assumption which would only be unjustifiable if 
there was definite concentration of candidates at a special 
mark in the range, and if such concentration did exist 
then the original data would not be given to us in this 
particular tabled form. We must remember the origin 
of this kind of table. It is used in cases where so many 
and Vcirious are the values of the character possessed by 
the individuals in the group, that grouping together of 
those possessing the character in nearly the same manner 
is necessary. If the original data were such that the 
marks obtained by different candidates were the same in 
many cases, then this kind of table would not be necessary. 


86 ELEMENTARY STATISTICAL METHODS 


In such a case as that we would have a table of this 
kind : — 

Masks Obtained 



a? 

41 

40 

49 

50 

52 

53 

58 

59 

62 

64 

66 

67 

70 

74 

76 

Total 

Sumber of 
Candidates 

1 

5 

5 

7 

10 

9 

15 

11 

12 

6 

14 

13 

10 

7 

5 

1 

133 


We therefore assume that the average mark obtained 
by those candidates obtaining 50-54 marks is 52, and 
we get then an estimate of the contribution to the 
total marks of the whole group supplied by these 34 
candidates, this is 34 x 52 = 1768. This assumption 
is used for each group, and we get, in this way, an estimate 
of the total marks obtained which we can use as the 
numerator of our ratio. As a matter of fact, this method 
of finding an estimate of this nature can be shown 
mathematically, on certcdn reasonable assumptions, to 
be justified, but the presentation of this argument is not 
suitable at this stage. We will therefore proceed to show 
the arithmetical work involved in the calculation of the 
estimated total number of marks obtained : — 


Marks 

1 

Number j 
of 

Candidates { 

Estimated 

Averages 

Number 

of 

Marks 

35- 

1 

37 

37 

40- 

5 

42 

210 

45- 

12 

47 

564 

50- 

34 

52 

1768 

55- 

23 

57 

1311 

60- 

22 

62 

1364 

65- 

23 

67 

1541 

70- 

1 12 

72 

864 

75- 

1 ■ 

77 

77 


1 133 


7736 E 


7736 

Average Mark = = 58'2 


7736 Estimated Total Marks 


Total 



r^ 

CO 

CO 

CM 

CM 

-h 





!>• 









a/d CO A 

J> 




*>• 










CO 

PQ 

CO CO ^ 04 


to 

OO 

ss 

+ 


O •— * Ol CO ^ 

c!» 




r^ 










o> 

o> CO ^ 

CO 

CM 


CM 

CM 

to 

to 

-I- 







-C 

to CO oo 

tA 




CO CO CO CO CO 

CO 




PQ 

CO CO ^ 

CM 

to 

to 


cr> 

+ 

CM 

CO 

CO 








*< 

CO cS ^5 





pq 






<3> CO to ^ — 

CO 

cx> 




CM 

CM 

CO 

-H 








to CO CO O) 

tA 




to to to to to 

to 




pq 




CO 

A 

CO CO 


GO 

!>• 

CO 








< 

0> »-« CO ^ 





to to lO to to 

to 




pq 

M M ^ ^ to 

CM 

lO 





to 

to 

’T 

-*5 

to CO CO 

»A 




^ ^ ^ ^ ^ 





pq 

C^ CO 

to 

CO 

<=> 

'f 



s 

CM 

-*5 

O 1— CSl CO ^ 

<i> 




^ ^ ^ ^ ^ 








GO 

r^ 


pq 



CO 

OO 

1 





s 

— «c 

C9 O 

•< 

to CO r^ CO o> 

OO CO CO OO CO 

J. 

CO 

"S 

a 

s Q 





'43 

09 

pq 

fc 

WfS 


88 ELEMENTARY STATISTICAL METHODS 

Moreover, it may be pointed out that, unless there 
is definite bias in the distribution of the marks in the small 
groups, any error made in an estimate of the marks 
contributed to the total by one group will probably cancel 
with an error in the opposite sense made in respect of 
another group, so that the errors involved in these estimates 
are not cumulated when we get to the total estimate, 
which therefore should not involve a large error itself. 
Further, it must be remembered that when the average 
is calculated, this error is reduced considerably, seeing 
that it is divided by 133. The average then is likely 
to be very near the unknown correct result and can be 
trusted as a good approximation. It is interesting to 
test in this particular case the errors which are involved 
in these calculations. The original marks are ea^y 
referred to and may be shown in tabular form in the 
scheme on p. 87, where the contributions to the total 
marks are shown for each group and where the 
differences between the estimates and the correct figures 
are also shown ; some of these are positive and some 
negative. 

The correct average mark obtained from the original 
figures is = 580, the error in the estimate 

133 

being 0 2 marks. This illustration may be considored 
as typical of the kind of error to which such an estimated 
average, calculated in this way, is liable. There is a 
further small point on method to which attention should 
be drawn. In the calculation of the estimated average 
some saving of arithmetical labour is possible by referring 
the marks to a new datum and by changing the mark- 
units. In the calculations worked in the table on p. 86, 


CALCULATION OF AVERAGES 89 

if 37, 42. 47 » are rewritten in a difierent form, the 
arithmetic is simplified. 



Estimated Number of Marks 


1 

5 

12 

34 

23 

22 

23 

12 

1 


57 

57 

57 

57 

57 

57 

57 

57 

57 


-5x 4 
-5 X 15 
-5 x24 
-5 X 34 

+ 5 X 22 
+ 5 X 46 
+ 5 X 36 
+ 5x 4 


133 X 57 + 5 X 108 
-5 X 77 

Total Marks = 133 x 57 + 5 x 31 


Average = 


133 X 57 + 5 X 31 
133 


= 57 + 


5 X 31 
133 


= 57 + 5 X 0-233 
= 57 + 1-2 = 58-2 


37 is written as 57 —20 or 57 — 5 x 4, 42 as 57 — 15 
or 57 — 5 X 3, and similarly with the others. In this 
way the arithmetic is considerably reduced, the result of 
course is the same. This process involves measuring 
the marks from a new datum 57 marks, instead of no 
marks, and also expressing the new marks in 5 marks 
units. In practice the scheme works as in table on p. 90. 

By this simple device our multiplications are r^uced 
to products by small numbers, i, 2, 3> 4. 5, - - . The new 
datum is chosen at a convenient figure somewhere in the 
middle of the whole range of variations of the diaracter, 
at a figure which is near those values of the character 
which are possessed by laige numb^ of the group. We 
tiy to replace the original measures of the variable 


90 


ELEMENTARY STATISTICAL METHODS 


Marks 

New 

Scheme 

Number 

of 

Candi- 

dates 


35- 

- 4 

1 

- 4 

40- 

-3 

5 

- 15 

45- 

-2 

12 

-24 

50- 

- 1 

34 

- 34 

55- 

0 

23 

- 77 

60- 

1 

22 

22 

65- 

2 

23 

46 

70- 

3 

12 

36 

75- 

4 

1 

4 



133 

108 


Total + 31 
Average + *233. 

Average in original units and 
scales = 57 + 5 (-233) = 58*2. 


by ± I, ± 2 , ± 3 , . . . and at the same time to associate 
the smallest of these with the largest of the numbers whidi 
have to be multiplied. This insistence on reducing the 
arithmetical labour involved in the calculations is not 
altogether imposed because we do not want to do too 
many computations, but also because the simpler the 
calculations the less the liability is there to error in our 
results, and above all, we want accuracy. A further 
illustration (p. 91) should make the above points dear. 

In this table, the large numbers in the distribution 
are in the range 20 cm, to 30 cm., and we take the new 
datum at the middle of the range 24-26 cm. 

A further difficulty sometimes arises in the calculation 
of the average in tlu^ cases where the information in the 
tables lacks exactness. In the table below, for instance, 


91 


CALCULATION OF AVERAGES 


T^fjCT Pm f ri'ifiw OF IjENGXH OF I^LAICE B^ S A SUItFD IN 1907 
From Journal of Royal Statistical Society, 1925, p. 245 



New 

Number 


Length 

Units 

of 


(cm.) 

and 

Scale 

Plaice 


18- 

- 3 

12 

- 36 

20- 

- 2 

101 

- 202 

22- 

- 1 

258 

- 258 

24- 

0 

297 

-496 

26- 

1 

152 

152 

28- 

2 

80 

160 

30- 

3 

47 

141 

32- 

4 

22 

88 

34- 

5 

10 

50 

36- 

6 

6 

36 

30- 

7 

5 

35 

40- 

8 

4 

32 

42- 

9 

2 

18 

44- 

10 

2 

20 



998 

732 


Total + 236. Average + = + *23647 (new scale and nnits). 

Average length (original scale and nnits) 

= 25 4- 2 (-23647) cm. = 25*47 cm. 


taken from the Report of the Royal G^mmission in the 
Coal Industry 1925, Annex, p. 263, which gives information 
on the profitability of undertakings, this difficulty arises. 


92 


ELEMENTARY STATISTICAL METHODS 


If we wish to calculate the average profit or loss p>er 
ton for the whole group, we can make the same kind of 
assumptions as before as to the estimated average profit 
or loss in the case of each small group of imdertakings in 
the table, except in resj)ect of the first and last. We were 
guided before by the limits of the ranges and assumed the 
average to be in the middle, but we obviously cannot do 
this when the limits of a group are not both defined. What 
is the estimated average loss per ton of the i6 undertakings 
in a group described as 7s. and over, what is the estimated 
average profit per ton of the 12 imdertakings in the group 
with a profit of 7s. and over ? We do not know whether 
the greatest loss incurred in a particular case is gs., los., 
or 15s. ; similarly we do not know the upper himt to which 
profits extended. Consequently, any assiunption we make 


\ 

Profit or Loss i 


Loss 


Under 


75 . and over 
55 .^ 

35- i 

Is- \ 

I 

j 

Is. loss 

Is. profit i 


New Scale 

j Numbers 


and 

1 of 


Datnm 

i Undertakings 

i 


- 4 


- 64 

- 3 

i 15 

— 45 

- 2 

24 

- 48 

— 1 

65 

i 

- 65 

0 

; 166 

- 222 

t 


Profit 


I5.— 

35 .- 

55 .- 

> 5 . and over 


1 

2 

3 

4 


203 

120 

32 

12 


653 


203 

240 

96 

48 


i -h 587 
Total -h 365 


365 

Average - + ^53 = 


'559. 


Average profit, original scale: O 5 . -h 2(*56)s. — 1’125. 



93 


CALCULATION OF AVERAGES 

as to the contribution from these two classes to the total 
profit or loss for all undertakings will have a further 
disturbing influence on this total. In practice, if we 
cannot have access to the original data so that more 
precise knowledge may be gained, we would base these 
estimates on the assumption that there was the same 
interval between the limits of these extreme groups as 
in the case of the other groups where the limits are known. 
Thus we should make our calculations as shown in table 
on p. 92. 

It is interesting to find out the difference between this 
average and that calculated on some other assumption. 
Let us suppKJse, for instance, that the losses and profits 
in the extreme cases ranged up to iis. in each, then our 
calculations would be shown in this way : — 




New Scale 

1 Number 


Profit or Lx>ss 

and 

of 




Datum 

Undertakings 



Is. and over 

- 4* 

16 

- 72 

Loss 

55.- 

35.- 

- 3 

— 2 

15 

24 

- 45 

- 48 


l5.- 

— 1 

65 

- 65 

Under 

I 5 . loss 

I 5 . profit 

0 

166 

- 230 


I 5 .— 

1 

203 

203 

Profit 

I 1 

2 

3 

120 

32 

240 

96 


7s, and over 

4 * 

12 

54 




653 1 

! + 593 


Total + 363 

363 

Average: -f* = -555. 

Average profit, original scale : Os. -h 2(* 555)5. = l-llr. 


94 ELEMENTARY STATISTICAL METHODS 

The change in average profit per ton is one hundredth 
part of a shilling, about one eighth of a penny, a negligible 
amount in this connection. There are two factors which 
enter into the calculations which reduce this difficulty 
in cases of this sort to negligible proportions. The first 
is that errors involved in assumptions of the averages of 
the extreme groups wiU tend to balance one another in the 
final total, ^ and the other is that the numbers involved 
in this kind of table are small compared with the total 
numbers, so that any error introduced is reduced 
considerably when the numerator is divided by the 
denominator. Of course, if the number of cases at the 
extremes of a table like this were large compared with 
the other numbers in the table, this method of procedure 
could not be adopted because the errors introduced then 
might be considerable, but the fact is, that if these numbers 
were large, more precise information would be given in 
the original table about them. It is only because the 
numbers in this kind of table at the extremes are small 
that they are considered as rather of less importance than 
the others, and the compilers of the table dismiss them 
with, “ there is a small group of 12 out of 653 with a profit 
of 7s. and over, and it is not worth while to specify exactly 
what these individual profits are.” 

Finally, let us consider the table on p. 95 relating to 
numbers of incomes assessed for super-tax, and the amount 
of these incomes. 

The original table from which these figures were 
extracted gave the columns (i), (2), and (3). Column (4) 
is obtained by dividing the figures of column (3) by the 

' We do not mean by this that there is in all such cases complete 
cancellation. 


95 


CALCULATION OF AVERAGES 


UK DisTRiBimoN OF Incomes Liable to Super-tax 

1924-5 


(1) 

lacome 

t^OOO) 

(2) 

Number of 
Persons 

(3) 

Total of 
Incomes 
assessed 
C^OOO) 

(4) 

Average 

Income 

t^OOO) 

(5) 

Middle 
Income of 
range 
(£ 000 ) 

2- 

23,225 

51,964 

214 

2-25 

2-S- 

15,493 

42,358 

2-73 

2-75 

3- 

18,385 

63,242 

3-44 

3*5 

4- 

10,294 

45,909 

4-45 

4-5 

5- 

6,382 

34.755 

5*45 

5 5 

6 - 

4,303 

27,818 

6*47 

6-5 

7- 

3,051 

22,817 

7*47 

7*5 

8- 

3,957 

35,414 

8*95 

90 

10- 

4,606 

55,687 

12-2 

12-5 

15- 

1,970 

33,929 

i;-4 

17-5 

20- 

1.025 

22,728 

221 

22-5 

25- 

554 

15,087 

27-2 

27-5 

30- 

589 

20,231 

34-4 

350 

40- 

304 

13,532 

44-5 

450 

50- 

327 

19,475 

59-5 

62-5 

75- 

128 

10,977 

85-6 

87-5 

100- 

143 

28,809 

201 ‘5 

— 

Total 

94,736 

544,732 

5-75 

— 


corresponding figures of column (2). Colunm (5) is 
obtained from colunm (i). It is necessary to note that 
the figures in colunm (4) are all less than the corresponding 
figures of colunm (5). Now if the whole of the information 
available to us were contained in columns (i) and (2), and 
if we proceeded to the estimate of the total income 
assessed for this purpose by using colunm {5), and 
multiplying these figures by those in colunm (2), and then 
adding the results, the estimated average would definitely 
be in excess of its true value, because all the figures in 
colunm (5) are greater than the true averages for each 
income grade. The method described above for obtaining 
an estimate of the average of a group which shall be close 


96 ELEMENTARY STATISTICAL METHODS 

to the real figure definitely breaks down in this case. 
This method can only be used with safety when the 
distribution of the numbers in the group, according 
to their possession of a given characteristic, is such that 
only small numbers of the group have the largest and 
smallest values of the characteristic, while the main body 
of the group possess the character to a moderate extent. 
Thus, in the table showing the distribution of marks of 
133 candidates, only 6 had less than 45 niarks, only 13 
had more than 70 marks, the remainder had marks 
between 45 and 70 marks. In such a table the numbers 
in the group, when distributed according to the values 
of a particular characteristic, increase from zero to a 
maximum and then decrease again to zero. Now in 
the table of those liable to super-tax, the largest group of 
persons is that with incomes above £2,000 but less than 
£2,500, the first group in the table, thereafter the numbers 
gradually diminish. This is an entirely different kind 
of distribution from the others already considered, and 
with this type the assumptions we made are not justified, 
and the method described of calculating the average, 
cannot be used. The difference between these types of 
distribution is very evident when they are presented in 
graphical form. 


GRAPHICAL METHODS 


Statistical data which are available in the form of 
tables are represented graphically with the object of 
making possible the appropriate comparisons more easily. 
Most people can normally appreciate the relative sizes of 
a number of figures more readily when these are pictorially 
presented, than they can by looking at a table. As an 
aid therefore to a proper realization of the relative sizes 
of different numbers, graphical methods are extremely 
iiseful. In some cases, of course, a graphical representation 
may merely give a rough approximation to the actual 
figures, this depends on the scale which is used. For 
instance, it is difficult to distinguish between 313 and 314 
when represented on a scale where 100 is equal to i in., 
the difference might exist and be important, but it would 
not be noticeable in such a diagram. 

Diagrams which are used to show the numbers of units 
in a group possessing different values of a character are 
of two kinds. In the first kind, which is illustrated in 
Diagram i, showing the number of families with different 
numbers of persons in the family (from the table p. 78), 
the numbers of persons per family are represented on a 
convenient horizontal scale, and the number of families 
corresponding are represented verticaUy by means of 
separate rectangular blocks of the same thickness. The 
number of famihes is indicated in each case by the height 


97 


98 ELEMENTARY STATISTICAL METHODS 

of the appropriate rectangular block, which is given 
thickness merely in order to improve the appearance of 
the diagram. The scale on which the number of families 
is represented is a linear scale and, of course, a line must 
be given thickness, if it is to be \dsible, and we go further 
in practice and instead of a line, use a rectangle. In 
some cases this rectangle is nothing more than a line of 



l_a. 


j 4 5 o ' 

Numbers of Persons m Family 


a 


fO 


Diagrw 1 1921. Hunsworth U.D, Distribution of 363 Private 

according to number of persons m Family. 


some considerable thickness. This practice is general 
in all cases of this kind where the variable character 
can be counted. We differentiate between the different 
numbers of the character by separatmg them on 
the horizontal scale by sufficient space, so that neigh- 
bouring rectangular blocks or thick lines are absoutey 
distinct from each other. 




99 


GRAPHICAL METHODS 

In the second kind, where the variable character is 
measurable, it is represented again on a horizontal scale, 
but the number of units in the group possessing the 
character between certain limits is represented graphically 
by means of areas. The scale used for the numbers is an 
area scale. Since the number which we wish to represent 
graphically is a number of units possessing the character 



between certain limits, it is reasonable to do this by means 
of a figure standing on a base corresponding to this 
interval. The simplest figure of this kind is a rectangle, 
and the area of the rectangle corresjxjnds to the number 
of cases having values of the character coming within 
the range of that interval. Thus Diagram 2 gives a 
graphical representation of the table on p. 84. The 
areas of the rectangles correspond to the numbers of 




lOO 


ELEMENTARY STATISTICAL METHODS 


candidates with marks between the various limits shown 
in the table. The rectangles in this kind of diagram are 
contiguous, and this is reasonable, since the numbers in 
the original table merge into one another in the sense that 
there will be certain individuals with 49 marks (say) 
who should be represented in a diagram in dose proximity 



Diagram 3. Coal Mining Indnstry. G.B. 1923. Distribution of Under- 

tainigs by Profit or Loss. 

to individuals with 5 ® marks. Diagram 3 iUnstrates a 
difBculty which is encountered in this kind of graph. 
In the original table the two extreme groups are merely 
indicated by the vague description “ loss of 7s. and oyer , 
“ profit of 7s. and over ”. The vagueness of description 
gives rise to a perplexity when we wish to make a graphical 


45 ZQ 


25 30 35 ^ ^ SO 55 60 65 


Diagram 4. Warrington. Ages of Textile Workers (Female) 


102 


ELEMENTARY STATISTICAL METHODS 

Census 1921. Warrington 
Occupied Females 12 years old and over 
Age-distribution of those engaged in Textile Occupations 


A^e 

12- 

li- 

1 

18- 1 

i 20 - 

25- 1 

35- 

45- 

55- 

60- 

65-69 

Total 

Number 

U 

S46 

290 

280 

492 

344 

1 ge 

1 

43 

10 

1 

6 

4 

1947 


The graphical representation of such a table consists of 
a number of rectangles of different widths corresponding to 
the changing age-limits of the grades. 

Great care must be taken over the construction of these 



diagrams. When the grade intervals are the same, the 
rectangles stand on equal bases, md the areas are projxir- 
tional to the heights of these rectangles, thus for practical 
purposes a linear scale is used for plotting. But when the 
grade intervals are not the same, the p)ersons constructing 
the diagram must think in terms of areas and not lengths, 
when the appropriate rectangles are being drawn. 

Diagram 5 again brings out the use of areas to represent 



103 


graphical methods 

numbas of units. This diagram shows graphically the 
table below giving the age-distribution of bachdors m 
England and Wales who married in 1925- This table 
gives the numbera at each year of age, and the numbere 
in 5 -year groups. In the diagram the area of each 
rectangle on the wider bases is equal to the total of the 
areas of the corresponding rectangles on the narrower 
bases (i year). 


Emglahd and Walks : Ages of Bachelors who Married in 1925 


Age 

Number 

Age 

Number 

Age 

Number 

16 

14 

39 

1,963 

62 

65 

17 

108 

40 

1,710 

63 

60 

18 

971 

41 

1,311 

64 

47 

19 

3.320 

42 

1,297 

65 

64 

TXi 

7,596 

43 

968 

66 

31 

21 

18.698 

44 

928 

67 

27 

22 

23.233 

45 

867 

68 

30 

23 

26,538 

46 

659 

69 

35 

24 

29,129 

47 

594 

70 

24 

25 

28,024 

48 

497 

71 

16 

26 

22,805 

49 

483 

72 

9 

27 

19,243 

50 

436 

73 

10 

28 

16,022 

51 

331 

74 

13 

29 

13,206 

52 

311 

75 

2 

30 

10,828 

53 

233 

76 

2 

31 

8,209 

54 

215 

77 

2 

32 

6,629 

55 

196 

78 

— 

33 

4,987 

56 

167 

79 

1 

34 

4,245 

57 

140 

80 

1 

35 

3,627 

58 

134 

81 

— 

36 

3,096 

59 

105 

82 

1 

37 

2,501 

60 

122 

83 

1 

38 

2.380 

61 

89 




DisTRiBirnoN in Age Groups 


Age 

Xbder 

20 

20- 

25- 

30- 

35- 

40- 

45- 

SO- 

66- 

60- 

66- 

70- 

75 

Number 


lfl$.194 

90,300 

34,806 

13,507 

0^4 

3400 

1,528 


388 

187 

72 

10 


104 elementary statistical methods 

In the age-group 30-34 there are altogether 34,898 
bachelors, of whom 10,828 are 30 years old, 8,209 are 31, 
6,629 are 32, 4,987 are 33, 4.245 are 34 ; and since the large 
difference between the numbers in successive 5-year groups, 
e g- 34.898 in the group 30-34 and 13,567 in the group 
35-39. is due to the grouping (we note that there are 3,627 
at 35, 3,096 at 36, and so on, numbers which are of the same 
order as 4.987 at 33 and 4,245 at 34), it would perhaps be 
preferable if, instead of making the graph with discontinuous 


/ \ 
» \ 



Diagram 6. England and Wales. Age<-distribntion of Bachelors who 

married in 1925. 

'* steps ” which are seen when rectangular blocks are used, we 
employed trapezia which would give a continuous outline 
to the graph. This is done in the hope that the graphical 
representation thus obtained will be a better approxima- 
tion to the original distribution than that when rectangles 
are used. This suggestion is followed up in the 
construction of Diagram 6, where the same data are cigain 
shown, both by rectangles and trapezia. If this diagram 
is compared with 5 it will be seen that the outline made 



GRAPHICAL METHODS io5 

by the sloping sides of the trapezia conforms to that made 
by the rectangles when the numbers at each age are 
plotted. This polygon diagram is, in fact, preferable to 
the rectangular block diagram (sometimes called a 
histogram), but it is normally more troublesome to draw, 
and the histogram type of graph is generally used, always 
imdeistanding that this is really a crude approj^ation 
to a shape, to which the polygon is a better approximation. 

The .polygon graph is constructed in the following 
manner. Let us suppose that we are dealing with the 
graphical represeptation of a table such as this, where the 
histogram would show rectangles increasing in height 
and then decreasing again as x increases. 


Variable 

character 

X- 

(ir+A)- 

{x + ikh 


(jr + (m - 1)*)- 

Number 

«• 






The widths of the trapezia are h units, suppose this 
corresponds to k inches on the horizontal scale. We start 
the graph with a triangle (instead of a trapezium) at one end 
of the scale. If one unit is to be represented by kl square 
inches, then if n«_i (say) is represented by a triangle 
of sq. in. on a base of k in., the vertical side of the 
triangle is 2n«_i/ in. Then ««_, is represented by a 
trapezium of i^- nrea, on a base k in., one 

vertical side of which has just been determined as 
in., and the other will necessarily be ( 2 n,»_, 
— 2 n^^l in. The next numberji*., is represented by 
a trapezium of area sq. in., on a base of k in., one 

vertical side is already determined as — 2n^_^l in., 

the other is therefore ( 2 n^^ — 2»»_, 4- 2 n«.i)/ in. 


io6 ELEMENTARY STATISTICAL METHODS 

Subsequent trapezia are constructed in like manner. The 
trapezium corresponding to on a base from x sh 
to X i)h would have the vertical side at x sh of 

length 2K - -.+{-) in., and 

the other vertical side at % + (s + i)A of length 2(n,+i 
— «s+2 + «*+3 . • • + (— in- The last 
trapezium (on a base x to x -{■ h) would be a triangle if 
2(»o — . . . + (— in. is equal to 

zero. This is not usually the case, for instance, in 
the illustration used, where = 4413, = 109,607, 

"a = 99 > 300 . etc. n® — + »2 . . . = — 31,387, and m 

such cases we must either have a trapezium instead of 
a triangle, or else have a triangle on a smaller base. This 
latter alternative is necessary in our illustration. Here 
the vertical side of the trapezium at 20 years is 
2(35,800)/ in. The area of the triangle of which this is 
to be the vertical side is 4,413^/ sq. in., the length of 

the base is therefore ^ in., and since 5 years is repre- 
sented by k in., this corresponds to about one eighth of 
5 years, i.e. just over one half of a year. 

When the length of the class interval changes the 
procedure is not so easily described in general terms, but 
an illustration will sufi&ce to show what is done. Suppose 
we wished to represent in this way the following table. 


Districts in England and Wales having less than 5,000 Insured 
Workers, distributed according to Unemployment percentage, 

AT 17th September, 1928 


Percentage 

Unemployment 

Under 

2 

2- 

4- 

6- 

8- 

10- 

15- 

20 - 

30- 

40- 

Total 

Number of 
districts 

17 

40 

1 

37 

» 

26 

20 

38 

21 

16 

13 

8 

236 


GRAPHICAL METHODS 107 

Take as scale of unemployment percentage, 10 per cent 
equal to i in., and 50 districts equal to i sq. in. Start 
with the 17 districts with percentage greater than o, but 
less than 2, the aiea to correspond will be 0 34 sq. in., 
this \sdll be represented by a triangle on a base of o-2 in. 
and height 3'4 in. The next number 40 will be represented 
by the area o-8o of a trapezium on base 0 2 in., one vertical 
side being 3-4 in., the other 4-6 in., and so on. We can 
show the results of these calculations in the table below ; — 


Percentage 

Unemployment 

Number 

Area 

Sq. in. 

Base 

In. 

Height of 
sides 





0 

Under 2 

17 

•34 

•2 

3-4 

2- 

40 

•80 

•2 

4-6 

4^ 

37 

•74 

•2 

2-8 

S- 

26 

•52 

i 

2-4 

8- 

20 

•40 

! *2 

1*6 

10- 

38 

•76 


1-44 

15- 

21 

•42 

•5 

•24 

20- 

16 

•32 

1-0 1 

i ' 

•40 

30- 

13 

•26 

1-0 

•12 

40- 

8 

•16 

1 

2-67 

0 


The last calculation concerning the base of a triangle 
of area o-i6 sq. in. and height 012 in. gives 2-67 in. Thus 
the polygon ends at 66-7 on the scale of percentages, 
whereas actually if we consult the original figures we 
find that the highest percentage unemployment of a 



io8 ELEMENTARY STATISTICAL METHODS 

district was 69-5. The graph drawn from these 
calculations is shown in Diagram 7. 

The distribution of incomes liable to super-tax shown 
in the table on p. 95 is represented graphically in 
Diagram 8. 

In all these diagrams it is to be observed that when 
there are great contrasts between the sizes of some of the 



0 To To To T so » » 

Diagram?. England and Wales. 1928 (17th September). Distribution 
of Districts according to Unemployment Percentage. 


figures to be plotted, the smallest are so small that they 
cannot be showm on the graph on the scale chosen, which 
is determined by the range of figuies in the table and the 
limited size of the paper at our disposal. 


GRAPHICAL METHODS 109 

gives, as a matter of fact, a graphical representation of 
another table obtkined from the original one. This table 
is called a cumulative table and it shows numbers of units 



Diagram 8. United Kingdom, 1924^5, Distribution of Incomes 

liable to Super-tax. 


in the group having less or more than successive values 
of the measurable character. Thus the table relating to 


Percentage 

Under 




8- 1 10- 

1 

i 

15~ 



\ 

Unemployment 

2 

2“ 

4^ 


20- 

30- 

40- 1 Total 

Number of 
districts 

17 

40 

37 

26 

i 

20 j 38 

21 

16 

1 

13 

f 

1 8 ! 236 

i 


Unemployment Percentages in different districts, might 
be replaced by a table of this kind 


no 


ELEMENTARY STATISTICAL METHODS 


A Cumulative Table showing the Number of Districts with 
Unemployment Percentage below Certain Limits 


Percentage , | 

Uuenjplo>T2ieiit : 

below 1 2 I 

4 

6 

! 

8 j 10 

15 

2D 

i 

30 i 40 

70 

Number of j 
districts 1 17 

i 

57 


1 

120 1 140 

178 

199 

215 j 228 

1 

236 


Alternatively, the table might be replaced by a table 
of this kind : — 

A Cumulative Table showing the Number of Districts with 
Unemployment Percentage above Certain Limits 


Percentage 



1 








Unemployment 



1 


8 

10 

15 

2D 

30 

40 

above 

0 

2 

4 j 

1 ^ 

Number of 
districts 

230 

219 

179 

142 

1 116 

90 

58 

37 

21 

8 


Nuaher cf 
Distrkts 



Ill 


GRAPHICAL METHODS 

The figures in such cumulative tables are plotted to 
make a cumulative diagram, by representing the numbers 
on a Hnt^r scale. The variable quantity of which we 
have the measurements (in this case Unemployment 
Percentage) is scaled horizontally, the scale of numbers 
is shown vertically. Points are plotted corresponding 

Olstncts 



to these pairs of values 2, 17 ; 4, 57 ; 6, 94 ; and so on, in 
the one case, and to o, 236 ; 2, 219 ; 4, 179 ; and so on, 
in the other case. These points are joined together to 
make a polygon outline as in Diagrams 9 and 10. It 
must be emphasized that in this kind of fliagram the 
numbers are now plotted on a linear scale, and not on an 
area scale as heretofore. The cumulative diagrams 



[12 ELEMENTARY STATISTICAL METHODS 


0) 

(2) 

( 3 ) 

{») 

(2) 

( 3 ) 

(1) 

(2) 

( 3 ) 

0-3 

2 

2 

6*9 

4 

107 

16-1 

1 

181 

0-7 

1 

3 

7*0 

2 

109 

16-5 

2 

183 

11 

1 

4 

7*1 

1 

no 

16-6 

2 

185 

1-2 

2 

6 

7*2 

4 

114 

16-7 

1 

186 

1-4 

1 

7 

7*4 

1 

115 

16-9 

1 

187 

1-5 

1 

8 

7*5 

1 

116 

17-1 

2 

189 

1-6 

3 

11 

7*6 

1 

117 

17-4 

1 

190 

1-7 

2 

13 

7*8 

1 

118 

18*4 

2 

192 

1-8 

2 

15 

7*9 

2 

120 

18*5 

1 

193 

1-9 

2 

17 

8*0 

2 

122 

18*9 

1 

194 

20 

1 

18 

8*1 

1 

123 

19-0 

2 

196 

21 

1 

19 

8*2 

1 

124 

19-2 

2 

198 

2-2 

1 

20 

8-4 

2 

126 

19-9 

1 

199 

2-3 

6 

26 

8*5 

2 

128 

20-2 

1 

200 

2-4 

6 

32 

8*7 

2 

130 

20-5 

1 

201 

2-5 

3 

35 

8*9 

1 

131 

21-5 

1 

202 

2-7 

1 

36 

90 

1 

132 

21-7 

1 

203 

2-9 

1 

37 

9*2 

1 

133 

22-0 

1 

204 

30 

1 

38 

9*3 

1 

134 

22-5 

1 

205 

31 

3 

41 

9*7 

4 

138 

24*3 

1 

206 

3-2 

2 

43 

9*8 

1 

139 

24-5 

1 

207 

3-3 

4 

47 

9*9 

1 

140 

25*0 

1 

208 

3-4 

1 

48 

10*0 

1 

141 

25-2 

1 

209 

35 

2 

50 

10*1 

2 

143 

25-6 

2 

211 

3-6 

2 

52 

10*2 

1 

144 

26-0 

1 

212 

3-7 

3 

55 

10*4 

1 

145 

27-0 

1 

213 

3-8 

1 

56 

10*5 

2 

147 

28*6 

1 

214 

3-9 

1 

57 

10*6 

2 

149 

29-0 

1 

215 

40 

2 

59 

10*7 

1 

150 

30-1 

1 

216 

4-3 

2 

61 

11*0 

1 

151 

30-5 

1 

217 

4-4 

6 

67 

11*3 

1 

152 

31-2 

1 

218 

4-5 

6 

73 

11*4 

1 

153 

31-4 

1 

219 

46 

1 

74 

11*6 

3 

156 

31-8 

1 

220 

4-7 

3 

77 

11-9 

1 

157 

33-1 

1 

221 

4-8 

1 

78 

12*0 

2 

159 

33-5 

1 

222 

4*9 

1 

79 

12*3 

1 

160 

34-7 

1 

223 

50 

2 

81 

12*5 

2 

162 

37-1 

1 

224 

5-1 

4 

85 

12*7 

1 

163 

37*8 

1 

225 

5-2 

1 

86 

12*8 

1 

164 

38-6 

1 

226 

5-4 

1 

87 

13-4 

1 

165 

38*8 

1 

227 

5-5 

3 

90 

13-6 

1 

166 

39-0 

1 

228 

5-7 

1 

91 

13*8 

2 

168 

41-1 

1 

229 

5-8 

2 

93 

13*9 

2 

170 

43-3 

1 

230 

5-9 

1 

94 

14*1 

3 

173 

48-7 

1 

231 

6*0 

1 

95 

14*2 

2 

175 

53-5 

1 

232 

6*1 

2 

97 

14-5 

1 

176 

55-7 

1 

233 

6*2 

1 

98 

14*6 

1 

177 

58-6 

1 

234 

6*4 

2 

100 

14-7 

1 

178 

62-2 

1 

235 

6*5 

1 

101 

15*4 

1 

179 

69-5 

1 

236 

6*7 

2 

103 

16-0 

1 

180 





(1) Unemployment percentage. (2) = Number of districts. 

(3) = Summation. 


GRAPHICAL METHODS 113 

be used to determine the number of districts with less 
or more than a certain percentage of imemployment not 
specified in the table. This number would correspond 
in the diagram to the vertical ordinate (y) erected at the 
appropriate point [x) on the horizontal scale. For instance, 
in Diagram 9 we can read off when x = 12, y = 155* 
deduce that roughly 155 districts had an unemplojment 
percentage less than 12. Naturally, any result of this 
kind will be only approximate. Alternatively, for a 
given y we can find the corresponding x. Thus when 
y = 118, x = 7*9, and we can say that half the districts 
(118 out of 236) had less than 7-9 per cent unemployed, 
and the other half had more than this percentage of 
unemployment. 

A cumulative diagram is really cn approximate 
graphical representation of the group of values of the 
character when these are arranged in order of size and the 
number of cases summed successively. The table on p. 112, 
giving the individual district unemployment percentages, 
and the results of successive summation will make this clear. 

In this table it should be observed that the results of the 
summation corresponding to the different unemplo5mient 
percentages do not tally exactly with the corresponding 
figures of the table from which Diagram 9 is constructed. 
Thus in the complete summation table above there are 
17 districts with a percentage 1*9 or less, 57 districts with 
3*9 or less, and so on ; whereas in the cumulative table on 
p. no there are shown 17 districts with less than 2 per cent 
unemplo3nnent, 57 districts with less than 4 per cent 
unemplo5mient, and so on. This apparent incompatibility 
can be shown to be illusory if we realize that each 
percentage has been worked out correct to the first 


1 14 ELEMENTARY STATISTICAL METHODS 

decimal figure only, and that a percentage of 3-9, for 
instance, may represent any number from 3-85 to 3-95, 
and a percentage of 4 0 may stand for any figure between 
3-95 and 4-05. Consequently, we ought really, in our 
cumulative table on p. no, to change the unemplo3mient 
percentages shown there, and present the table in the 
form : — 


Percenuge ] 











uoder 

1‘95 

3-95 

5-95 

7'^ 

9-95 

14-95 

19-93 

29-95 1 

1 39-95 

69*95 

Number of 











districts 

17 

57 

94 

! 

120 

\40 

178 

199 

215 

22£ 

236 


The corresponding information in the complete 
cumulative table (p. 112) would then read, for instance, 
instead of 17 districts with a percentage of 1*9 or less, 17 
districts with less than 1-95 percentage, and so on. From the 
point of view of the graphical representation, this change 
in the x unit has such a slight effect that it is hardly 
noticeable in a diagram, and in practice the table on p. no 
is used as the basis of the graph instead of the table 
above. 

Diagram ii shows the beginning of the cumulative 
graph when the individual percentages are known. It is 
constructed from the data of the table on p. 112 by 
plotting the results of successive summation against the 
corresponding unemployment percentage. In reahty, 
Diagram 9 is an approximation to Diagram ii. This last 
gives the true facts relating to the percentages in different 
districts, but the table on p. no, which is the kind of 
table which as a rule is available to us, gives only a 
summary of the total information about the districts. 
This table, then, supplies us with only a few of the points 
of our real cumulative diagram ; these are indicated 


GRAPHICAL METHODS 


115 


^ tender of 
Distrtcts 
Ho- 


rn 


90 


BO 


70 


60 


50 - 


40 


30^ 


BO- 


ZO 


/ 


.• V 

/ 

.• / 

;/ 

• / 

/ 


•/ 

/ 


/• 


.c 


•/ 

V 


:/ 

/ 


./ 


y 


7 


• / 

/ 

•/ 




7 

7 • 


m 


7 

• 


2 3 4 5 

' Percentage Unemployment 

Diagram 11. 


6 


“7 

7 


V# 


ii6 ELEMENTARY STATISTICAL METHODS 

by small circles in Diagram ii. Straight lines joining 
them \\’ill give us approximately the position of the 
unknown part of the cumulative graph. In this way, we 
get Diagram 9 as an approximation to Diagram ii. 

A cumulative diagram shows us then, by means of 
vertical ordinates, the approximate number of the whole 
group possessing less than (or more than) any measure 
of the character represented on the horizontcd scale. In 
effect this kind of diagram puts in order of magnitude the 
individuals of the group, and is very helpful in assisting 
us to pick out measurements possessed by certain 
individuals. 

One of these of most value is the Median measurement, 
which is used as a representative of the group. 



Chapter 8 


THE MEDIAN AND MEASURES OF DISPERSION 

We have described how the average is used for purposes 
of comparison ; the average wage of a group of persons 
is perhaps used to represent the group in a comparison 
with another group. Similarly the median is used. The 
median is an obvious measurement to take ; it is that of 
the middle individual in the group, when those in the 
group are disposed in order according to the size of their 
measurements of the particular character we are 
considering. Moreover, if all the detailed measurements 
of the group are known, the median is easily identified 
without any calculation. Thus, if there are seven persons 
of heights 5 ft. 7 in., 5 ft. 9 in., 5 ft. 10 in., 5 ft. ii in., 
6 ft., 6 ft., 6 ft. I in., the median is 5 ft. ii in., the height 
of the middle person, when all are arranged in order of 
height. In order to get the average we should have to 
calculate the total height 41 ft. 2 in. and divide by 7, 
giving 5 ft. io| in. It is seen that the median and 
average are not necessarily the same, though in certain 
cases they may be identical or approximately identical. 
If there is no middle individual, as happens when there 
is an even number in the whole group, the median is taken 
as the average of the two middle measurements. Thus, 
the median of the following group of wages 45s., 46s. 6d., 
47s., 48s., 48s. 6d., 49s., 49s., 50s., is 48s. 3d., half-way 
between the two middle wages, 48s. and 48s. 6d. Generally, 


II7 


IIS ELEMENTARY STATISTICAL METHODS 

then, practically half of a total group of individuals possess 
the character to an extent less than the median measure- 
ment, and the other half possess the character to an 
extent greater than the median. Now we see how the 
cumulative diagram can be used to identify the value of 
the median, when there are large numbers in the group. 
The ordinates of such a diagram, corresponding to values 
of the character on the horizontal scale, indicate the 
numbers in the group with less (or more) than those 
values of the characteristic. Consequently, the vertical 
ordinate corresponding to the median value in the 
horizontal scale will be exactly half the total number in 
the group. Thus, in Diagram 9, the vertical ordinate 
indicating half of the total 236 corresponds to 7-9 in the 
horizontal scale. The median percentage is 7-9. The 
complete details on p. 112 show that the Ii8th district has 
a percentage of 7-8, the 119th and 120th districts have 
percentages 7-9, thus more accurately the median is 
7-85. Diagram 9 is, as we pointed out, only approxi- 
mately correct, consequently any estimate of the 
median, obtained from it, is itself necessarily only 
approximate. 

The median can also be obtained by calculation, without 
drawing the cumulative graph, by reference to the 
cumulative table itself. Thus, in the case of our 
illustration, the table on p. 114 shows 94 districts with 
percentages less than 5 '95 120 with less than 7‘95 » 

the middle iiSth and 119th will be somew'here in this 
group of 26 with percentages between 5-95 and 7 - 95 - 
Assuming that the percentages of these 26 districts are 
evenly distributed betw^een these limits, by a sunple 
proportion sum we identify the median as 5 '95 



MEDIAN AND MEASURES OF DISPERSION 119 

X 2, which is 5-95 + x 2, giving 7 8 to 
26 20 

the nearest decimal place. 

Let us consider another example. On p. 91, there 
were shown figures relating to profit and loss sustained by 
undertakings in the Coal Mining Industry. Without 
constructing the whole ciunulative table we can write 
down the information we need in this form : — 

Number of Profit less 
Undertakings. than 

286 1/- 

489 3/- 

Total 653 

The middle undertaking is the 327th, which will be in 
the group of 203 with more than is. but less than 3s. 
profit. Assuming that the profits of these 203 are evenly 
distributed in the range from is. to 3s., we identify the 

327 —286 82 

median as is. H X 2s., which is is. H 

203 203 

i.e. i*4s. It will be noted, first, that in calculating the 

median, none of the difficulties are encountered in this 

case which were met with when the average was being 

estimated. Secondly, the median and average are not 

the same ; the average was i-is. 

We have so far considered, in our processes of analysing 
tabulated data the calculation of averages for purposes 
of making comparisons. These averages are used as 
simple representatives of cumbersome data ; by comparing 
averages we can get crude comparisons between the 
various pieces of information in our tables, but we realize, 
of course, that by resorting to averages for this purpose 


120 


ELEMENTARY STATISTICAL METHODS 


we have sacrificed a good deal of information at our disposal. 
It is \\ise, therefore, whenever f>ossible, to supplement 
the information conveyed by an average with further 
detail. For instance, in the table on p. 65 relating to 
average wagon loads, the final figures for Great Britain 
in July, 1929, and July, 1930, are 5-51 tons and 5-41 tons 
respectivelj'. But actually, as these figures are repre- 
sentative of such widely different areas and classes of 
freight, it is thought right to divide the country into a 
number of districts and to dixdde the freight traffic into 
the three classes shown in that table. Thus, the final 
average figures are qualified by the other figures in the 
table, and we see that the 5J tons average figure is 
representative of figures ranging from 2 tons to 12^ tons. 

The table on p. 95, referring to income tax, again 
shows how little information is actually conveyed by an 
average ; the incomes range from £2,000 to over £100,000 
a year, the average is £5.750- 

It is useful to us to have some idea of the amount of 
variation in the group of measurements which we are 
considering. Ixo knowledge of this is con.e^cd by 
an average. If we say that the average wage of a 
group of men is 45s. there is nothing in this statement 
to indicate whether this is an average of a group whose 
wages range from 40s. to 505., or of a group whose wages 
range from 30s. to 70s. Without some indication of the 
manner in which the characteristic is distributed 
throughout the group, the average alone is a poor 
representative, and any comparisons made solely on 
averages will have the defect of crudeness. 

The" height of a person conveys to others a certain 
amount of information about his size, but, if we also know 



MEDIAN AND MEASURES OF DISPERSION 12 1 

his chest measurement and waist measurement, we are 
the more able to form a mental picture of this person 
than from a knowledge of height alone. We have now 
to consider other statistics of a group of measurements 
besides the average. 

In practice, as far as possible, when a representative 
figure, such as an average, is chosen for a group, it is supple- 
mented by other figures which convey an idea of the extent 
of variation in the group. So far we have considered the 
use of the average and the median as representatives. 
Just as there are these alternative possibilities in the 
choice of a representative, and as we see later there are 
other methods of representing a group, so there are 
alternatives in our choice of figures to indicate the amount 
of ^variation in the group. 

One of the simplest methods is to use the actual range 
of variation in the group. Thus on p. 112 there are 
236 districts with unemployment percentage ranging from 
0*3 to 69-5, a total range of 69-2. There are disadvantages 
in using the range. In the first place we cannot, as a rule, 
obtain it from the usual table. For instance, we cannot 
get the range of variation in profit and loss in the Coal 
Mining Industry from the table on p. 91, because the 
lowest and highest figures are not specified in such a 
table. Secondly the range depends on two only of the 
values and, if one of the extreme values were very much 
different from its nearest neighbouring values, the 
inclusion of that value in the group might make a 
considerable difference to the range. If the highest 
value 69*5 were not in the group of Unemployment 
Percentages, the range would be altered to 61-9, which 
shows a great change from 69-2. At the same 


122 


ELEMENTARY STATISTICAL METHODS 


time the general disposition of the other members of 
the group has remained unaltered. We want a figure 
which will give us an idea of the dispersal of the measure- 
ments in the group, and the range itself for these reasons 
is not considered generally suitable. 

It is preferable to rely upon a figure which depends for 
its size on the general disposition of the measurements of 
the group, and not merely on the two extreme values. The 
general variation in the group can be indicated by 
means of the differences between the individual measure- 
ments and the representative figure. These differences 
are called Deviations. 

If the average is the representative figure, the differences 
between the individual figures and the average are the 
deviations | some of these are necessarily positive and 
some are negative, the sum of all the deviations is zero. 
If the median is the representative figure, the deviations 
are again positive and negative, but the sum of them is 
not necessarily zero. 

A useful measure of the extent of the variation in the 
group is the average deviation or mean deviation. This 
is simply obtained by considering the deviations (without 
signs) as a group of measurements and getting the average 
of them. Obviously, if there is large variation 
amongst the individual measurements, there will be large 
deviations from the representative figure, and the mean 
deviation ^^ill be large. If the measurements in the 
group are all very close to one another, they wiU also be 
close to their representative, the deviations will aU be 
small and the mean deviation wiB also be small. Th^ 
the mean deviation will serve as an indication of the 
extent of little or great variation in the whole group. 


MEDIAN AND MEASURES OF DISPERSION 123 

Moreover to its calculation all members in the group will 
contribute something. 

There are two measurements of this Idnd : the mean 
deviation frmn the average and the mean deviation from 
the median. Thus if we had a small group of wages : 
45s., 46s. 6d., 47s., 48s., 48s. 6d., 49s., 49s., 50s., of which 
&e average is 475. loid., and the median 48s. 3d., the 
deviations are : — 



From iho 
average. 

From tte 
median. 


-2/101 

-1/4* 

-3/3 

-1/9 


- /m 

+ / li 

-1/3 
- /3 


+ /7* 

+ 1/ H 
+ 1/. ** 

+ /3 
+ /9 
+ /9 


+ VH 

+ 1/9 

Sum of detiatioiis 
, (i^aosing sigDs) 

10/3 

10/0 

Jfeaa devia&m 


~l/3 


.calcdatum of the mean deviaticm, when the 
ipira^tion regaidmg the group is given in the form of a 
a^Eh pmsCTts difficulties, just as did the ralm lati on 
of avisage. Let us conSii^ as an illustration the 
.^ta leppehtoig the marks obtained by 153 r-andidi^tps in 
an examiiudiont shown on p. 84 : — 



nverago was estimated at 58*2 marte. We know 
d ^nife e ^ the facts set out in the table bdow i 











124 


ELEMENTARY STATISTICAL METHODS 


Groups with negative 
deviations from the 

Number oj 

average ranging 

Candidates. 

between 

23-2 & 19-2 

1 

18-2 & 14-2 

5 

13-2 & 9-2 

12 

8-2 & 4-2 

34 

Group with both 
positive & negative 
deviations 

23 

Groups with positive 
deviations from the 
average ranging 
between 

1-8 & 5-8 

22 

6-8 & 10-8 

23 

11-8 & 15-8 

12 

16*8 & 20-8 

1 


We assume, as before, for purposes of obtaining an 
approximation to the mean deviation, that the average 
deviation in the constituent groups is half-way between 
the limits between which the deviations range. Thus, 
we assume that, of the 34 candidates with deviations 
ranging between 8-2 and 4-2, the average deviation is 
6-2 and therefore obtain an estimate 34 X 6-2 for the 
contribution of this group to the total deviation, from 
which we are to get the mean deviation. We further 
assume that the middle group of 23, containing individuals 
some of whom have negative deviations and some of whom 
have positive deviations from the average, can be split 
up into two, the one containing those individuals with 
marks ranging from the lower limit of the group to the 
average, and the other containing those with marks 
ranging from the average to the upper limit of the group, 
and that the numbers in these two groups can be estimated 
by simple proportion, based on the assumption that the 


MEDIAN AND MEASURES OF DISPERSION 125 


wholfe 23 are evenly distributed between the limits of the 
group. Thus, since the range of marks in this group is 5 
and the central mark is 57, we assume the existence of a 
group, with negative deviations, with marks from 54^ to 58*2, 
a range of 37, and a group, with positive deviations, 
with marks from 58*2 to 59^, a range of 1*3. (It will be 
observed that we are taking the range of the whole group 
as from 54^ to 59^ ; this is consistent with the original 
marks given to the nearest whole number. A mark of 
55 really stands for any mark between 54^ and 55^ ; 
if 300 were the maximum it is possible that three 
candidates, allotted 55 marks when the maximum is 100, 
might be awarded 165, 164, 166, corresponding to 55-0, 
547, 55-3.) We split this group of 23 candidates into 
two parts, assuming that 17 have negative deviations from 
0 to 37, and 6 have positive deviations from 0 to 13, 


3*7 1*3 

(17 is — X 23 and 6 is — x 23}. 

5 5 


Then we do as before. 


assume that the 17 candidates have an average deviation 
of 1-85 (half-way between 0 and 37) and the 6 candidates 
have an average deviation of 0*65 half-way between 0 and 
1*3). Thus for the calculation of the mean deviation we 


the following table : — 




Average 


Total 


deviation. 

Candidates. 

deviation. 

Negative : 

21-2 

1 

21-2 


16-2 

5 

810 


11-2 

12 

134-4 


6-2 

34 

210-8 


1*85 

17 

31-45 

Positive ; 

0-65 

6 

3-90 


3*8 

22 

83-6 


8-8 

23 

202-4 


13*8 

12 

165-6 


18-8 

1 

18-8 



133 

953-15 


126 


ELEMENTARY STATISTICAL METHODS 


Errors enter into the calculation of the total deviation 
due to the average mark of those in a small 5-mark group 
being assumed half-way along the range of variation in 
the group. These errom tend to cancel when the average 
is being calculated, but are cumulated when the mean 
deviation is being obtained. An approximate correction 
on this account has been calculated theoretically, and 
consists in subtracting from the total deviation so far 
obtained, | X range of the small groups x the number in 
that middle group in which the individual marks range 
between limits which include the average. This correction 


in our case is 


-7x5x23 = - iQ-iy- 

D 


Thus the total 


deviation is 95315 - = 933-98. 

933*98 

The mean deviation is - = 7 0. 

The closeness of this approximation may be tested by 
reference to the correct mean deviation calculated from 
the original details shown on p. 87. The average is 58*0 
(p. 88), and the deviations, with the number of candidates 
having these deviations from the average, are shown 
on p. 127. Thus the estimate made from the table is 0*4 

too great. 

Another method of measuring the amount of variation 
in a group of measurements is by means of the Standard 
Deviation. This measure is calculated as a representative 
of the individual deviations from the average in the 
Mowing way: the deviations from the average are 
squared and these squares are averaged, the sq^ rwt 
of this average is the standard deviation. Thus, if we had 

seven measurements, 8, 6 , 9, 7 » 7 » 9» ^ ® 

average is 8, the deviations from the average are 0. - 2, 


MEDIAN AND MEASURES OF DISPERSION 127 

-f- I, _ I, — I, + 2 , +1, the squares of these, o, 4, i, 

12 

1, 1,4, 1, are averaged, — = i • 7 ^ 4 * The standard deviation 


Negative 

Deviations 

No. of 
Candidates 

Total 

Deviation 

20 

1 

20 

16 

2 

32 

14 

3 

42 

13 

1 

13 

12 

1 

12 

11 

1 

11 

10 

4 

40 

9 

5 

45 

8 

6 

48 

7 

4 

28 

6 

3 

18 

5 

7 

35 

4 

14 

56 

3 

10 

30 

2 

3 

6 

1 

5 

5 

0 

4 

— 



441 

1 

1 

1 

2 

6 

12 

3 

6 

18 

4 

2 

8 

5 

7 

35 

6 

1 

6 

7 

9 

63 

8 

6 

48 

9 

4 

36 

10 

4 

40 

12 

6 

72 

13 

3 

39 

14 

1 

14 

15 

2 

30 

19 

1 

19 



441 


Total deviation 882 


Mean deviation = 6*6 



128 


ELEMENTARY STATISTICAL METHODS 


is the square root of 1*714, i.e. 1*36. Generally, if x^, x^, 
ATg, . . . x„ are the n measurements in the group of which the 

X “1^ X X 

average is ^ { = — -) the deviations from the 


average are — x, x^ — x, ... X n — x, and the st andard 
deviation is 


j/(fi 


- xf + {x^ -x)^ + • . - xY 


Just 


n 


as in the calculation of the mean deviation, we get 
difficulties occurring in the calculation of the standard 


deviation when we want to obtain this from data given to 
us in tabular form, owing to the grouping together of 
individual measurements of different sizes but coming 
between the same limits. Again, for purposes of obtaining 
an approximation, we assume, as we did when calculating 
the average and the mean deviation, that the individuals 
in a group have the same deviation from the average, the 
difference between the middle value of the variable in the 
appropriate range, and the average. Thus, taking ag^ 
as an illustration the distribution of candidates according 
to marks obtained in an examination, we proceed in 


this way ; — 


Marks 

Assumed 
Average 
Marks of 
Groups 

Number of 
Candidates 

Assumed 

Deviation 

from 

Average 

For Total 
Squares 
of 

Deviations 

35 - 1 

40- 

45- 

50- 

55- 

60- 

65- 

70- 

75— 

37 

42 

47 

52 

57 

62 

67 

72 

77 

Average 

58-2 

\ 

- 21-2 
- 16-2 
- 11-2 
_ 6*2 
- 1-2 
+ 3-8 
+ 8-8 
+ 13-8 
+ 18-8 

1 X 21 -2* 

5 X 16-2* 

12 X 11-2* 

34 X 6-2* 

23 X 1-2* 

22 X 3-8* 

23 X 8-8* 

12 X 13-8* 

1 X 18-8* 


MEDIAN AND MEASURES OF DISPERSION 129 


In order to save trouble in the calculations we adopt, at 
this stage, a simple device derived from the formula for 
the standard deviation. This latter is the square root of 
{xi - x)^ {x^ - {x^ - xy 


n 


which may be written as 
V-b ^2* +• • . 2x{xi-\-Xi + . . . x^) 


n 


n 




U X • - . Xn 

but smce x , the above reduces to 


“f" ^2* “b • 


n 


n 


-x\ 


Thus if we measure our x’s from any zero whatsoever 

A 1 lx %* + ^2* + • • . 

and calculate , then subtracting x*, 


n 


will give us the desired quantity from which to extract 
the standard deviation. Instead, therefore, of calculating 
the sum of the quantities in the last column of the above 
table we calculate Xj* + ^2* + • • . from x*s measured 
from 57 as we did on p. 89. 


Values of 

X 


Number of 
Candidates 


For total x* 


-20 

- 15 

- 10 
- 5 

0 

+ 5 
+ 10 
+ 15 
+ 20 


1 

5 

12 

34 

23 

22 

23 

12 

1 

133 


1 

5 

12 

34 

0 

22 

23 

12 

1 


20 *= 1 
15* = 5 
10 * = 12 
5* = 34 

5* = 22 
10 * = 23 
15* = 12 
20 * = 1 


5* 

5* 

5* 

5* 

5* 

5* 

5* 

5* 


16 = 5* 
9 = 5* 
4 = 5* 
1 = 5* 

1 = 5* 
4 = 5* 
9 = 5* 
16 = 5* 


16 

45 

48 

34 

22 

92 

108 

16 


« = 5 X 0-233 (see p. 89). 381 ~ 133 = 2-865 


5* X 381 


130 ELEMENTARY STATISTICAL METHODS 

For the standard deviation we have then : — 

25 X 2*865 - 25 X 0*233*, which is 25 X 2*865 —25 
X *054, i.e. 25 X 2*8ii. 

A sensible correction has now to be applied to counter- 
balance the assumption that the middles of the ranges of 
the groups in the table are identical with the average marks 
of these groups, just as a similar correction was necessary 
in the case of the mean deviation. This is Sheppard's 
correction, and it consists of subtracting ^th of the square 
of the ranges of the groups ; in our illustration we there- 
fore subtract ^ x 5*- gives 25 x 2*811 - 25 

X *083=25 X 2*728. ^ o 

The standard deviation is 53/2*72^ *= 5 x 1*649 — o*2 

marks. , , j 

We can compare this result with the standard deviation 

obtained from the actual deviations (see p. 127). 


Negative 

Deviations 


; dumber of 
Candidates 


20 

16 

14 

13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 


1 

2 

3 
1 
1 
1 

4 

5 

6 

4 
3 
7 

14 

10 

3 

5 

4 


(Devia- 

tions)* 


400 

512 

588 

169 

144 

121 

400 

405 

384 

196 

108 

175 

224 

90 

12 

5 

0 


Poative 
! >eviations 


Number of 
Candidate^ 


19 

15 

14 

13 

12 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 


1 

2 

1 

3 
6 

4 
4 
6 
9 
1 
7 
2 
6 
6 
1 


(Devia- 

tions)* 


361 

450 

196 

507 

864 

400 

324 

384 

441 

36 

.175 

32 

54 

24 

1 


MEDIAN AND MEASURES OF DISPERSION 13 1 

Total (deviation)^ is 8182. The average (deviation)^ 
^ 8182 _ ^ standard deviation is \/6i • 52 = y-S. 

133 

The standard deviation calculated from the table was 
8-2, 0-4 too great. 1 

Quartile Deviation 

A further method of measuring the amount of variation 
in the group is by means of the Quartile Deviation. 

This measure is associated with the median, which is 
sometimes used as a representative of the group. The 
median is the measurement of the middle individual in the 
group when all are arranged in order. The quartile 
measurements are similarly those of the individuals 
standing half-way between the extremes and the median. 
The median and quartUes divide the whole group into 
four equal parts. The quartiles are referred to as Lower 
Quartile and Upper Quartile, the lower quartile being 
less than the median, which in turn is less than the upper 
quartile. If there are 103 individual measurements in a 
group arianged in order of size, starting with the smallest, 
the lower quartile is the measurement of the 26th, the 
median is the measurement of the 52nd, the upper quartile 
is that of the 78th. Sometimes difficulties are encountered 
in the location of the median and quartiles because there 
are no individuals in the half or quarter positions. Thus, 
if there were 100 units in the group, the median would 
be assumed to be half-way between the 50th and 51st 
measurements, the upper quartile would be taken as half- 
way between the 75th and 76th measurements, and the 

‘ Although the mean deviation may be calculated when the deviations 
are taken from the average or the median, the standard deviation is only 
computed from deviations from the average. 


132 


ELEMENTARY STATISTICAT, METHODS 


lower quartile half-way between the 25th and 26th. If 
there were 102 units in the whole group, the lower quartile 
would be the measurement of the 26th individual, the 
median is half-way between the 51st and 52nd, the upper 
quartile is the measurement of the 77th individual. 

The Quartile Deviation is defined as half the difference 
between the upper and lower quartiles. 

As an illustration we may refer to the table on p. 112, 
where there are 236 units in the group. The median is the 
measurement half-way between the ii8th and 119th 
measurements, 7-85, the lower quartile is half-way 
between the 59th and 60th measurements, the 59th is 4-0, 
the both is 4*3, and we take the lower quartile as 4-15. 
The upper quartile is half-way between the 177th and 
^78th measurements which are 14*6 and 14-7, the upper 

. 14-65 -4-15 

quartile is 14-65. The quartile deviation is ^ — — 

= 5*25. 

The position of the quartiles in the scale of measure- 
ments can be approximately located either from the 
cumulative diagram as in the case of the median, or by 
calculation from the cumulative table. Thus Diagram 9 
shows that the lower quartile is 4*3 cent and the 
upper quartile 15-2 per cent. The following extracts 
from the table on p. 114 show that the lower quartile, 
between the 59th and both measurements, is 3*95 + 


Percentage 

under 

Number of 
Districts 

Percentage 

under 

Number of 
Districts 

3-95 

57 

9-95 

140 

5 95 

94 

14*95 

178 


uediak akd measures of dispersion 133 

59I ""57 ^ 2 =yg 5 + — x 2 — 3*95 + *13 = 4*^ » 
94 — 57 *37 

and the upper quartile, between the 177th and 178th 

1774 — 140 ^ , 37i 

measurements is 9*95 + jyg _ ^ 5 — 9’95 + 

X 5 = 9*95 + 4*93 — 14*9* 

We have then the following results : — ^ 


• 

Fram ilie 
Original 
Data 

Fran Uie 
Cnmnlative 
Diagram 

Fran the 
Cnmnlative 
TaMe 

Lower QaartSe 

4*15 

4*3 

4*1 

Upper Qnartfle 

14*65 

15*2 

14*9 

Qnartile deviatkm. 

5*25 

5*45 

5*4 


If we take as a farther iUustration the table of marks 
of the 133 candidates, and use the cumulative table, 
vdiich is : — 


Maxfca bdow 

40 

45 

SO 

55 

60 

65 

70 

75 

80 

Number of 
Candidates 

1 

6 

18 

52 

75 

97 

120 

132 

133 


The median is the number of marks obtained by the 
67th candidate, v^oi they are arranged in order of merit, 

and is given by 55 4 - ^-^ x 5 "= 55 X 5 = 58 * 3 - 

The lower quartile is 50 + ^ x 5 = 50 + — 

52 - 18 34 

X 5 = 52 * 3 - 


34 


ELEMENTARY STATISTICAL METHODS 

100 -^ - 97 £ 

The upper quartile - 97 ^ 5 - 05 + “ 

X 5 = 65-8 

The quartile deviation is 6-7 marks. 

It is worth while collecting these results, in order to 
dress the difference between them. 


Marks 


Average 

Mean deviation from the 
average 

Standard devdation 


Median 

Quartile deviation 


Marks 

58-3 

6-7 


It is necessar}^ to emphasize the fact that these various 
constants are different. They are expected to be different. 
They are obtained after submitting the tabulated material 
to different processes, and different results are likely. 
It is not usual to work out aU these measures in a particular 
case. If we desire to indicate by means of a single figure 
the extent of the variation in a group, we should find one 
only of these measures. The various measures which 
have been described are used on different occasions. 
Sometimes the tabulated material is in such a form that 
it is preferable to use the quartile deviation, mdeed, this 
measure may be the only one of its kind which is capab e 
of calculation. With other matenal it may be preferable 
to obtain the standard deviation, when its calculation is 
possible. But there is a danger into which the unwary 
may be led, due to this embarrassing choice of measures of 
variation. Naturally we only obtain a measure of varia- 
tion so that we can readily compare the group rom 
point of view with another ^oup. We should always s 
that, when making comparisons, we are compa g 
same kind of measures with one another. It wo 


MEDIAN AND MEASURES OF DISPERSION i35 

wrong to compare the standard deviation of one group 
with the quartile deviation of another. The quartUe 
deviation and standard deviation of the same group differ 
sensibly and such a comparison as that suggested would 
be vitiated from the start. It is just as wrong to do this 
as it would be wrong, in comparing two persons as to 
size, if the height were taken as the measure of size in one 
case, and the chest measurement were used as the measure 
of size in the other case. 



Chapter 9 

WIEIGHTED SUMS AND WEIGHTED AVERAGES 

So far in our disciission of the contributions to the total 
by the individual members of a whole group we have 
confined ourselves to the cases where each member is 
considered to have the same importance as the others. 
Now we have to deal with those cases where this is not so. 
There are times when we Nvish to obtain a total foi a group 
of units, when these units may be considered as equivalent 
from some points of view but are not so from other points 
of view. For instance we may be concerned with a group 
of 200 persons travelling on a particular train. If we 
think of these as 200 “ souls ” each unit is equivalent to 
all the others ; if we think of them from the point of view 
of the Railway Company we may wish to obtain an idea 
of the seating capacity of the train required to carry them, 
and neglect the *' infants in arms and perhaps arrive 
at a total of 180 persons, or we may have to think of them 
as first class ticket holders, or third class ticket holders, 
or as persons travelling at half rates (e.g. children) or as 
persons traveUing at special rates (e.g. workmen’s ticket 
holders or cheap fare ticket holders). Thus a whole group 
may be considered homogeneous in one sense, and at the 
same time heterogeneous in other senses, and we are led 
to the notion of expressing certain of the members of the 
group in terms of “ equivalent units ”, in order to get a 
total for the group which can be considered properly 
comparable with some other similar total. 


136 



WEIGHTED SUMS AND WEIGHTED AVERAGES 137 

We may consider an example of this kind of treatment 
supplied by the population of England and Wales. In 
1881 the total population was 25,974,000, in 1921 it was 
37,887,000, an increase of 46 cent. Dr. E. C. Snow 
{Joutnol of the Royal Statistical Society, 1929* p3-rt iii, 
P* 333) considering the members of the population as 
consumers of commodities, and real i z in g that the con- 
sumption-demand varies somewhat with age, suggests a 
scale, representing the average equivalent consumption- 
demand for persons in different age-groups, taking unity 
as the maximum consumption-demand (for persons aged 
30). The scale is shown below 


Males or Females, 
Age 

0-14 

15-29 

30-44 

45-59 

60-74 

75-90 

Equivalent Con- 
sumption-demand 

•19 

•81 

•95 

•68 

•32 

•06 


Thus he suggests that the average consumption-demand 
of children under 15 is • 19, that of persons aged 30 being 
taken as unity. If we divide the population into age- 
groups and apply these figures as multipliers we shall 
obtain totals, for the two years, which may be compared. 
The details are shown top of p. 138. 

Thus the population at these two dates consist of 
16,300,000 and 22,600,000 consumption-demand units, 
i.e. the two populations are equivalent to these numbers 
of persons aged 30. The increase from 1881 to 1921 is 
39 per cent, which differs considerably from the 46 per 
cent change in the actual numbers. 

Again, in the London Survey, already referred to p. 9, 
the following scale is used to indicate housing needs of 


138 ELEMENTARY STATISTICAL METHODS 


Age 

Males anc 

1881 

(000) 

1 Females 

1921 

(000) 

Equivalent 

Con- 

sumption- 

demand 

Consul 

demani 

1881 

(000) 

raption- 
d Units 

1921 

(000) 

0-14 

9,469 

10,500 

•19 

1,800 

1,995 

15-29 

6.923 

9,615 

•81 

5,600 

7,790 

30-44 

6,686 

8.148 

•95 

6,350 

7,740 

45-59 

2,981 

6,051 

■68 

2,020 

4,110 

60-74 

1,580 

2,925 

i -32 

505 

935 

75-90 

336 

648 

•06 

20 

40 


25,974 

37,887 


16,295 

22,610 


different persons. Men a^ed i8 and over and women 
aged i6 and over are taken as adult units, bo}^ 14 to 
17. and girls of 14 and 15 are considered as f of these 
umts, children of both sexes from 5 to 13 are taken as | 
units, and infants o to ^ as J units. We can get the total 
number of equivalent adults from the table below : — 



1881 

mns. 

1921 

mns. 


Equivalent 

Adults 

1881 

Equivalent 

Adults 

1921 

Male and Female 0 4 

3-5 

3-3 ! 

1 i 

0-88 

0-82 

•f ., ,, 5— 13 

5-4 

6-4 i 

1 i 

2-7 

3-2 

Males 14-17 . 

1-0 

1-4 I 

1 } 

•75 

105 

Females 14-15 . 

05 

0-7 

1 

•38 

•53 

Males 18 and over and 
Females 16 and over 

15-5 

260 

1 1 

15-5 

260 


26- 0 

37-9 1 


20-2 

31-6 


Thus the comparative number of equivalent adults, from 
the point of view of housing needs, has changed from 
20*2 millions to 31*6 millions, an increase of 51 per cent. 

The number of “persons" has increased by 46 per 
cent between these two dates, considered as consumers 


WEIGHTED SUMS AND WEIGHTED AVERAGES 130 

the increase is only 39 per cent, considered from the point 
of view of housing needs the increase is 51 per cent. 

Two further illustrations of the necessity for the 
consideration of the contributions of the different members 
of a group being regarded as of different relative importance 
may be quoted. 

When the total output of Coal in Germany is being 
considered for the purpose of making comparisons, Lignite 
is either separated from Bituminous Coal or included with 
it by expressing it in terms of Bituminous. Thus, in the 
Report of the Royal Commission on the Coal Industry (1925), 
p. 243, we read : “ The additional 64 million tons of 
Lignite raised in 1925 represents, in terms of Bituminous 
Coal, at least 15 million tons, or substantially more than 
the reduction in consumption of Bituminous Coal in the 
country.” 

A further illustration is taken from the Special 
Memorandum No. S, The Physical Volume of Production, 
t>y J. W. F. Rowe, of the London and Cambridge Economic 
Service. When discussing the question of preparing an 
Index of Production for the Paper, Printing, and Allied 
Trades, using imports of raw materials, the author says : 
" The different imported raw materials do not however 
all result in the same paper equivalent. The British Paper 
Makers Association have suggested the following method 
of estimating the total paper equivalent. Deduct one half 
the weight of net wet pulp imports (both chemical and 
mechanical), and one-tenth the weight of dry mechanical. 
This gives absolutely dry pulp equivalent, to which should 
be added dry chemical imports. Absolutely dry pulp 
yields 90 per cent of its weight as paper. Esparto and 
other fibres also yield approximately 90 per cent of paper.” 


ELEMENTARY STATISTICAL METHODS 


1 40 

The following table shows how this method of obtaining 
the total equivalent weight of paper has been applied to 
statistics of imports for the first six months of 1928. 

The monthly trade returns June, 1928, give the original 
figures. 


Net Imports 1928 (Jan.-June) 

Scale of 1 
Dry 
Pulp 
Equi- 
valent 

Dry 

Pulp 

Equi- 

valent 

Scale of 
Paper 
Equi- 
valent 

Paper 

Equi- 

valent 

Pulp : — 

Chemical Dry (tons) 205.797 

10 

205,797 



Chemical Wet (tons) 9,136 

•5 

4,568 



Mechanical Dry (tons) 1,153 

•9 

1,038 



Mechanical Wet (tons) 316,042 

•5 

158,021 



Total Palp 532,128 


369,424 

•9 

332,482 

Esparto and other Fibres 

(tons) 156,1W 



•9 

140,544 


i 



473,026 


Thus the total net imports are equivalent to 473,000 
tons of paper. The total tonnage of Paper Making Materials 
may be interesting, from the point of view of the transport 
of these materials from one place to another, but from the 
point of view of the paper trade it is necessary to consider 

this other total of paper equivalent. 

These examples suf&ciently illustrate the idea that, on 
occasion, the constituent parts of a total drawn from the 
different members must be considered not of the same 
relative degree of importance, but as definitely different, 
and that a scale indicating this must be introduced before 
the necessary total can be arrived at. Such aggregates 


WEIGHTED SUMS AND W^EIGHTED AVERAGES 141 


are referred to as weighted sums, the relative importance 
of the different items is indicated by “ weights 
Generally, if the items are Jz, 1 2 , • • • 
weights are W^. W^, W„ . . . W,, corresponding to these 
items, the weighted sum is /iWi 4- -^21^2 + • • • ^vWn- 
In the above illustrations the various multipliers which 


have been used are called " Weights 

Similarly when we are averaging a series of items of 
different degrees of importance we use a “ weighted 
average ”, which is obtained by relating the weighted sum 
to the total of the weights. Thus the weighted average is 
IiWx + ItWz + . . . InWn 
Wx+W,+ ... W„ 

It will be observed that the ordinary simple average is a 
special case of the weighted average, when the weights 
are all equal to unity. In this case the weighted average 


becomes 


7i 


n 


n being the number of items. 


A weighted average is unaltered if all the weights are 
multiplied or divided by the same quantity. Thus if the 
weights are 2Wx, 2pTj, . . . 2 W„, instead of W^, W^, . . . 


Wn, the weighted average. 


2lxWx + 27, IT, + . . . 27„IT, 

2Wx + 2lT, + . . . 2W^ 


is the same as before. Thus the absolute size of the weights 
does not matter, the relative weights only are of importance. 
This fact is often of great value when a .large number of 
calculations are involved, especially as it can also be shown 
that slight changes in the weights do not have any material 
influence on the weighted average, so long as the essential 
relative sizes of the weights are maintained. It means, 
in practice, that if the weights involved are rather large 
numbers they can be reduced to more manageable 


142 ELEMENTARY STATISTICAL METHODS 

proportions without affecting the resulting average. This 
is illustrated in the example below : — 


Items 

Weights (1) 

Products 

Weights (2) 

Products 

19 

48 

912 

5 

95 

25 

58 

1450 

6 

150 

23 

28 

644 

3 

69 

26 

54 

1404 

5 

130 

22 

76 

1672 

8 

176 

24 

49 

1176 

5 

120 

27 

31 

837 

3 

81 

26 

42 

1092 

4 

104 

21 

19 

399 

2 

42 

22 

23 

506 

2 

44 


428 

10092 

43 

1011 


. , , 10092 1011 

Weighted average (1) (2) 

= 23-6 = 23-5 


The difference between these may not be of any 
significance in a particular case, and the gain due to the 
reduced computations may be quite considerable. This 
principle also operates in those cases where we may feel 
sure that the items to be averaged should be accorded 
weights indicating their relative importance, but where at 
the same time we may have difficulties in determining 
what these weights actually are. In such cases we may 
be able to decide on approximate weights, and so long as 
these approximations do give a fair idea of the relative 
importance of the items in the group, we can trust 
the resulting average, on the grounds that any slight 
inaccuracies in the weights will not materially affect the 
result of the averaging process. Such cases arise quite 

frequently. 


WEIGHTED SUMS AND WEIGHTED AVERAGES 143 

'When the items to be averaged are all very nearly the 
same e.g. percentage figures near 100, we can often 
reduce the computations by expressing each figure as an 
excess or defect of some standard figure, as in the following 
illostration : — 


Items 

Items 

from 

100 

Weights 

Erodncts 

95 

-5 

5 

-25 

103 

+ 3 

15 

+ 45 

102 

+ 2 

13 

+ 26 

98 

-2 

8 

-16 

97 

-3 

4 

- 12 

100 

0 

6 

0 

106 

+ 6 

9 

+ 54 

93 

-7 

11 

-77 

98 

-2 

8 

- 16 

99 

- 1 

6 

- 6 

101 

+ 1 

7 

+ 7 

102 

+ 2 

12 

+ 24 

97 

-3 

10 

-30 



114 

-26 


Weighted average from 100 = — *2 

Wei^ted average ia 99*8 


111 such cases the saving in arithmetic is considerable, 
with, of course, a reduced chance of making com* 
putation errors. 

It is mteresting to consider, m the general case, how 
great is the difference between the ordinary rimple average 
and the weighted average. We may say, generally, th at 
if the allocation of weights to the items is such that the 
larger items have Hie smaller weights, and the smaHar 
items have the larger weights, Ihe smaliftr items have 
more influence in the formation of the weighted average. 


144 


ELEMENTARY STATISTICAL METHODS 


and this is less than the simple average. On the other 
hand, if the larger weights are attached to the larger 
items, and the smaller weights to the smaller items, the 
opposite is the case, and the weighted average is greater 
than the simple average. But if the allocation of weights 
to the items is not apparently connected with the size 
of the items, that is, if larger weights are as much attached 
to larger as to smaller items, and similarly with smaflor 
weights, the difference between the two kinds of average 
may be inconsiderable. We can show this a]gd)raica]ly, 
as below. 

We are given n items, /„ . . . /«, having wei^ts 
Wi, . . . W,. Let us call the simple average of 
the items I, and the simple average of the wdghts W, 

4* + . . . /* Wi + Wj + . • • W, 

then/ = , W= 

n n 

Let us obtain the deviations of the items and the weights 
from their averages, and call them t't, . . . ui 
Wi, »»,... w,. Then /*= / + 

Wi=W + Wi, W, = W + . We have obviously 

^L±i!L±-l.l^=0. The 

n ’ n 

process of obtaining the weighted average is set out 
below : — 


Items 

Weights 

Fkodacts 

I + h 
/ + ». 

W + mi 
W + w. 

IW + Wii + Iwi + 

JW + Wt, + /», + HWi 

• • • ■ 

• • • • 


W + w. 

• • • • 
iw + w», +/w. + »•». 


nW 

nTfV + h*i + W + • • • 


WEIGHTED SUMS AND WEIGHTED AVERAGES i.i5 

nlW "1“ 'l i^i ^ 2^2 “I” • • • ^ n^’n 
Thus the weighted average is - 




i{W-, + i-,W2+ . • • ‘In^n 

_7 1 . 

The difference between the weighted average and the simple 

average is therefore 

Now, some of the i s are positive and some are negati\e, 
so also with the li^’s ; and if positive i s tend to be associated 
with positive w's, and negative I’s with negative w’s, then 
this difference will be positive and the weighted average 
will be greater than the simple average. Here is the case 
of the larger items having the larger weights and the smaller 
items the smaller weights. On the other hand, if the 
positive w’s tend to be associated with negative I’s and 
negative z£'’s \vith positive I’s, the sum of the products 
{iw) will be negative and the weighted av'erage w’ill be less 
than the simple average. This is the case of the larger 
items having the smaller weights and the smaller items 
the larger weights. But if we find that there is no particular 
allocation of the weights to the items, determined by size, 
then, sometimes positive 7£'’s \nll be found with negative 
i’s as well as with positive I’s ; and negative i£^’s \vill 
also be associated indiscriminately uith positive and 
negative i’s. In this case, the products {iw) will be 
irregularly positive and negative and a good deal of 
cancellation occurs when the total iiW^-\- i^w^-V • . . in^n 
is obtained, so that this total may be quite small, and the 
difference between the weighted average and the simple 
average will be smaller, since this total is divided by the 
sum of the weights, nW. In this case the two kinds of 
average may be, for practical purposes, the same. 


46 ELEMENTARY STATISTICAL METHODS 

Let us take a simple illustration : — 

(i) Large items with large weights, small items with 
mail weights. 


Signs of Deviations front 
averages. 


Items. 

W eights. 

Products. 

lUfns, 

Weights. 

Products. 

1 

1 

1 

— 

— 

+ 

2 

2 

4 

— 

— 

+ 

3 

3 

9 

— 

— 


4 

4 

16 

— 

— 

+ 

5 

5 

25 

— 

— 

+ 

6 

6 

36 

+ 

+ 

+ 

7 

7 

49 

+ 

+ 

+ 

8 

8 

64 

+ 

+ 

+ 

9 

9 

81 

+ 

+ 

+ 

10 

10 

100 

+ 

+ 

+ 

55 

55 

385 



55 . 


Simple average Jq 


Weighted average =7*0 

(2) Large items with small weights, small items with 
large weights. 


Signs of Deviations from 


Items. 

Weights. 

Products. 

Items. 

averages. 

Weights. 

Products. 

1 

10 

10 

— 

+ 

- 

2 

9 

18 

— 

4 * 

— 

3 

8 

24 

— 

+ 

— 

4 

7 

28 

— 

+ 

— 

5 

6 

30 

— 

+ 

— 

6 

5 

30 

+ 

— 

— 

7 

4 

28 

+ 

— 

— 

8 

3 

24 

+ 

— 

— 

9 

2 

18 

+ 

— 

— * 

10 

1 

10 

+ 

— 

— 

55 

55 

220 

Sitaple average = 5* 
220 . 

Weighted average = = 4* 


WEIGHTED SUMS AND WEIGHTED AVERAGES 147 


(3) Items and weights not associated. 





Signs of Deviations from 





averages. 

lUms. 

Weights, 

Products, 

Items, 

Weights. Products, 

1 

10 

10 

— 

+ “ 

2 

3 

6 

— 

- + 

3 

6 

18 

— 

+ - 

4 

4 

16 

— 


5 

5 

25 

— 

__ 

6 

8 

48 

+ 

+ + 

7 

2 

14 

+ 

— — 

8 

1 

8 

1 

T 

— — 

9 

9 

81 

+ 

+ + 

10 

7 

70 

+ 

+ + 

55 

55 

296 

Simple average = 5-5 




Weighted 

average = =5-4 

In the 

last, the 

weights 

were given 

to the items in 

haphazard fashion determined by the code 


W E 

I G H 

T D A 

V R 


10 3 

6 4 5 

8 2 1 

9 7 


This appreciation of the reason why the weighted 
average and the simple average differ is very profitable 
in certain cases, where it is known definitely that the 
average of a group should be a weighted average — as the 
items should be considered as having different degrees of 
importance — ^but where we have great difficulty in deciding 
the weights. We may have reason for believing that the 
weights, if known, would not be in the slightest degree 
influenced by the size of the items ; that, as likely 
as not, large and small weights would be associated 
indiscriminately with large and small items. If this is 
the case we are justified in taking the simple average as 
a good approximation to the unknown weighted average. 

There are occasions when a weighted average of a series 


148 ELEMENTARY STATISTICAL METHODS 

of items is really an ordinary simple average. For instance, 
the male population of England and Wales in 1921 was 
18,075,000, and the average number of deaths of males in 

. 240,605 

1920, 1921, 1922 was 240,605, the ratio ^ x 1,000 = 

13 • 3 is called the crude death-rate. Now the male popula- 
tion may be divided into age-groups, and the numbers 
dead may also be so divided, and the death-rates of males 
of different ages may be calculated, to indicate the incidence 
of death at different ages. 


England and Wales (Males) 


Ages 

( 1 ) 

Population 

1921 

( 000 ) 

( 2 ) 

Deaths 

(Average 

1920-1922) 

(3) 

Death-rate 

(per 

1 , 000 ) 

Under 5 

1,681 

55,361 

32-9 

5- 

1,767 

5,151 

2-9 

10 - 

1,837 

3,314 

1-8 

15- 

1,728 

4,901 

2-8 

20 - 

1,488 

5,447 

3-8 

25- 

2,621 

11,551 

4.4 

35- 

2.496 

17,004 

6-8 

45- 

2,133 

25,073 

11-8 

55- 

1,383 

34,639 

25-0 

65- 

730 

42,025 

57-6 

75- 

225 

29,685 

131-6 

85- 

25 

6,455 

259-9 


18,075 

240,605 

13-3 


This figure, 13-3, may also be considered as the average 
of the death-rates in the different age-groups ; if it is so 
regarded it is as a weighted average, the weights being 
the numbers in these age-groups. In the table above 
column (3) is obtained by dividing the figures of coUram (2) 
bv those in column (i). If we consider the figures in 


WEIGHTED SUMS AND WEIGHTED AVERAGES 149 

column (3) as a series of items to be averaged, the weights 
being the figures in column (i), the figures in column (2) 
will be the products of the items and weights, and the 
weighted average is obtained by dividing the total of 
column (2) by that of column (i). 

Generally, a simple average may be calculated from totals 
relating to a whole group, and at the same time similar 
simple averages may be calculated from corresponding 
figures belonging to the various constituent parts into 
which the whole group has been divided. The simple 
average for the whole group will be a weighted average of 
the similar averages of the constituent parts. As a further 
illustration, the rate of growth of the population of England 
and Wales from 1921 to 1931 may be expressed as 5*4 per 
cent of the population in 1921 ; the rates of growth of the 
different counties may also be calculated. The figure 
5*4 for the whole country will be the weighted average of 
these rates of growth, calculated for the counties, the 
weights being the county populations in 1921. 

On p, 72 certain average outputs per man-shift 
were worked out. It will now be realized that these 
averages can be regarded either as weighted averages 
or simple averages. We showed that if in the six districts 
the following number of man-shifts in the standard period 
were worked : — 


District 

I 

II 

III 

IV 

1 

1 i 

1 V 1 
1 1 

1 VI 

Total 

Man-shifts | 

(000) j 36,010 

15,045 

1 

42,145 

i 

58,149 

1 

84,121 

35,637 

271.107 


Then the total output would have been 4,866-4 million 
cwt. in 1924, first quarter, giving an average output per 


150 


ELEMENTARY STATISTICAL METHODS 


man-shift of 17 ’95 c\vt. But we may now think of this 
last figure as the weighted average of the output j)er 
man-shift for each district, the weights being the number 
of man-shifts in the standard period. The calculations 
shown on p. 72 are merety those performed when 
working out the weighted average. As a matter of 
interest we may rework these weighted averages, taking as 
weights 36, 15, 42, 58, 84, 36 instead of the original 
figures. We should get the results detailed below : — 


i 

Weights! 

1 

i 

Items I 
(1) 

' 

From 

lS-00 

1 

i For 

1 Weighted 
i Average 

1 

1 

Items 

(2) 1 

i 

From 

18-00 

For 

Weighted 

Average 

36 1 

1 18-95 1 

I -r *95 

; 34-20 

19-01 

+ 1-01 

-f 36-36 

15 

i 17-12 

- -88 

1 — 13-20 

18-05 

— }” ’Od 

-j- *75 

42 

17-15 

— -85 

i — 3o-70 

17-91 

- -09 

- 3-78 

58 

16-19 

- 1-81 

i — 104-98 

16-36 

^ 1-64 

- 95-12 

84 

20-60 

-h 2-60 

-i- 218-40 

20-30 

+ 2-30 

-h 193-20 

36 

1 14-88 

! - 3-12 

; — 112.32 

14-79 

- 3-21 

— 115-56 

271 

1 

! 

i 

— 13-60 



15-85 


Weighted average (1) 


1800 


13-60 

271 


= 17-95 


Weighted average (2) 


18-00 -h 


15-85 

271 

18-06 


Xhcsc results 3.r6 the S 3 jne ss those obtained previously 
using the original figures as weights. 



Chapter io 
INDEX NUMBERS 

Weighted averages are frequently used in the calculation 
of index numbers. We may be concerned \vith the measure- 
ment of the change from one period to another in a certain 
factor, but this change may not be susceptible of direct 
measurement, owing to the factor not being liable to 
numerical appreciation at any time. But evidence of this 
change is observed when measurements are made of 
quantities which are influenced by it. We can then 
collect statistics relating to a group of quantities, from 
which can be calculated the changes for each quantity 
in the measurements obtained. If these are expressed as 
percentages we have a group of them, each of which owes 
its size partly to the hidden factor about which we desire 
information. We can, by taking an average of the group 
changes, get an approximate notion of the change in this 
factor. We deal, in practice, with relative changes, as 
suggested above, and not absolute changes, because the 
quantities about which we have information may be 
measured in different units, and the final result is expressed 
in relative form, as a percentage, for this reason. 

We may. for instance, be concerned with the general 
change in industrial producti\ity in a country over a 
period of time. Evidence of this change may be seen in 
a variety of ways, in changes in output of coal, of steel, 
of cotton yams, of motor cars, of units of electricity, of 
tobacco manufactures, of beer, and so on. Some of the 
units involved in these will be tons, numbers, barrels, etc. 
We can reduce these changes to a common unit by 


152 


ELEMENTARY STATISTICAL METHODS 


expressing them as percentages, or, as is most often done, 
instead of relating the changes in output of a particular 
industry or commodity to the output in the earlier period, 
we relate the total output in the one period to that in the 
other. In the latter case we get for our different industries 
or commodities a group of percentages, which reflect in 
some way or another the change which has taken place 
generally in industrial productivity. From this group we 
obtain an average which we take to indicate the level of 
industrial productivity at the one period compared wth 
that at the other period. This average, which is a per- 
centage figure, we call an Index Number of Industrial 
Production. 

Similarly, we may be concerned with attempting to 
measure, generally, changes in the Price Level. Evidence 
of this change may be observed in changes in the prices 
of wheat, pig-iron, rubber, leather, raw cotton, etc., and 
as these prices all involve different units, we express them 
as percentages, or, instead of concentrating on actual 
changes, we may express each price in the later period 
as a percentage of the corresponding price in the earlier 
period. These percentages form a group, the average is 
taken to indicate the general change from the earlier 
period to the later period, and is called for this reason 
a Price Index Number. 

Generally, we may indicate the procedure in this way 


Unit. 

A 


Statistics for 
First Period. 


Statistics for 
Second Period. 


Percentages, 
ajoi X 100 

bjh X 100 


Percentage 

Increase. 

X 100 

<*i 

X 100 

Oi 


B 




INDEX NUMBERS 


153 


The Index Number is that obtained as an average of the 
first column of percentages, but sometimes the average of 
the second column is used, and thus indicates the 
percentage change from the earlier period to the later. 

As an illustration we may quote from the Ministry of 
Labour Gazette, January, 1929, where details are given of 
the calculation of the Cost of Living Index Number. 


Article 

Avera^ 

July. 

1914 

;e Price 

1st Jan., 
1929 

1 1st. Jan. 1929 com- 
i pared with July 1914 

Per- ' P®'"- 

centage ! «ntage 
Increase 



s. d. 

s. 

d. 


i 

Beef : British 






1 

Ribs 

per lb. 

10 

1 

4J 

168 

' 68 

Thin Flank 


6i 


9i 

138 

38 

Beef : Chilled or 

Frozen 






Ribs 

per lb. 

7i 


101 

143 

i 

Thin Flank 




5J 

113 

1 13 

Mutton : British 







Legs 


lOi 

1 

6 

174 

' 74 

Breast 


6i 


10 

154 

1 

Mutton : Frozen 






I 

Legs 


6i 


iij 

169 

69 

Breast 


4 


0 

127 

1 27 

Bacon (Streaky) . 


Hi 

1 

4 

143 

i 43 

Flour 

per 7 lb. 

lOJ 

1 

3i 

146 

46 

Bread 

per 4 lb. 

5i 


8i 

149 

! 49 

Tea . 

per lb. 

1 6i 

2 

H 

155 

55 

Sugar (Granulated) 

2 


3 

149 

i 49 

Milk . 

per qt. 

3i 


6i 

189 

' 89 

Butter : 







Fresh 

per lb. 

1 2J 

2 

1 i 

172 

72 

Salt 


1 2i 

1 

Hi i 

166 

66 

Cheese 

% 9 

8i 

1 

3 : 

172 

72 

Margarine . 


7 


7i i 

106 

' 6 

Eggs (Fresh) 

each 

li 


2i 1 

231 

131 

Potatoes 

per 7 lb. 



64 ; 

136 

. 36 

Fish . 

• 



j 

218 

118 

Average 

• 

i 


! 

159 

1 59 



154 ELEMENTARY STATISTICAL METHODS 

The Ministry of Labour Gazette generally quotes the 
increase in cost of living as a percentage, here 59 per 
cent. The index number is 159. This figure indicates 
the relative movement from the general price level of 
July, 1914, to that at the beginning of 1929. 

The problem of obtaining an index number, thus outlined, 
appears to be a simple one. In practice, however, 
difficulties occur, first, in deciding upon a base period, 
second, in deciding what amount of information is to be 
included in the scope of the calculations, third, in deciding 
what weights shall be given to the items to be averaged, 
fourth, in deciding what kind of averaging process is to 
be used in the calculation of the index number. 

With regard to the first point, it is convenient in many 
cases to relate later period figures to the same earlier 
period figures, i.e. to have a fixed base to which reference 
is made. Thus, in the Board of Trade Index of Production 
the year 1924 is the base year, the production in later 
years is expressed as a percentage of that in 1924. In 
choosing a base period consideration must be given to the 
question how far the statistics of this period may be 
regarded as reasonably normal. It is not convenient to 
choose as a period to which reference is continually being 
made one which is known to have been unusual, in the 
sense that it was a period, for instance, in which there 
was a great deal of labour unrest, or financial chaos, or 
a period of war. For if we did make such a choice, we 
should alwa)^ have to qualify any index number which was 
calculated with some statement calhng attention to the 
abnormality of the base period. For instance, after the 
War many comparisons were made with the immediate 
pre-War years, but, each time figures indicating these 


INDEX NUMBERS 


155 


comparisons were quoted, attention was drawn to the 
fact that in the year before the War conditions of trade, 
employment, and so on were unusually good— we were at 
a boom period. In order to evade this difficulty, instead 
of taking a single period as base for the purpose of 
computing an index number, an artificial base period is often 
chosen. The experience of a group of years is averaged 
and these averages are taken as the base figures to which 
subsequent figures are referred. Thus, in the original 
index number of production worked out for the London 
and Cambridge Economic Service by J. W. F. Rowe (see 
Special Memorandum No. 8) the base period was 1907-1913. 
The output of coal, for instance, in a later year, say 1925, 
was expressed as a percentage of the average output in 
these years 1907-1913, and so for the other commodities 
whose output was included in the computations of the 
index number. The index was obtained as an average of 
these percentages. The wholesale price index number 
calculated by the Statist is referred to the average experience 
of the group of years 1867-1877, this period is the base 
period of the index number. 

Since the War, for many index numbers the year 1924 
has been chosen as base period, as it was considered to be 
the first post-War year which could be regarded as free 
from abnormal events ; previous years’ statistics had been 
disturbed by strikes, troubles abroad, and so on. 

So far we have been concerned with the question of 
a fixed base period to which all subsequent years should 
be referred. From some points of view, there are advantages 
in having a moving base. In this case, instead of each 
period’s figures being related to the fixed base period’s 
figures, each period’s figures are related to the preceding 


156 ELEMENTARY STATISTICAL METHODS 

period’s figures. Thus, if we were dealing with annual 
figures, we would express the 1928 figures as percentages 
of those for 1927, and get an inde.x number on 1927 as 
base ; then for 1929 we would express the 1929 figures as 
percentages of those for 1928 and get an index number 
for 1929 on 1928 as base ; similarly we would get an index 
number for 1930 based on 1929 ; and so on. This method, 
called also the chain base method, automatically gives the 
short period changes and is of advantage when the year 
to year movements are of the more interest than the 
change over a longer period of time. We can, of course, 
obtain from these a new series of index numbers referred 
to a fixed base. For instance, if we have the following 
figures : — 

Index for 1928 referred to 1927 is 125. 

1929 .. 1928 .. no. 

„ 1930 .. .. 1929 „ 105. 

Then x 110 ^n-ill give an Index number for 1929 on 1927 as base 
125 ^ li9 V 105 .. 1930 ” 1927 .. 

and we should have these index numbers on the base 1927 

1997 1923 1929 1930 

IW 125 137-5 144-4 

There is another advantage of the moving or chain base 
method over the fixed base method, which is concerned 
with the available information from which the index 
number is to be calculated. In a long period of time the 
evidence, which we are using as indicative of the chan^g 
nature of the factor we are deaUng with, may alter 
somewhat, and it may be difficult to get proper com- 
parisons. For instance, if we are dealing with a price mdex 



INDEX NUMBERS 


157 


number based on a period before 1900 (say), we cannot 
compare prices of motor cars in 1932 with similar prices 
in the base period. Prices current in the base period 
similarly may have no counterpart in the later period. 
The same kind of difficulty arises when the same description 
is used at two periods to cover substantially different 
articles, e.g. stockings made of wool at one period and 
stockings made of artificial silk at a later period. We may 
easily be able to make comparisons between successive 
years, but may find difficulty in establishing the proper 
contacts between years some time apart. In such a case 
the chain base method enables us to link up two distant 
years, and an attempt to make a direct comparison 
perhaps break down. 

It is worth while, at this stage, examining the second 
of our considerations, viz. the information to be used in 
the calculation of the index number. The data used 
may be the whole information available or they may 
only be a sample of this. In certain cases sampling only 
is practicable, e.g. in the calculation of a cost of living 
index number. A glance at a grocer's shop window, or a 
list of prices of different "cuts" of meat, or a study of a 
store's list, is enough to make us realize what a large 
variety of articles may enter into consideration if we are 
dealing with changes in price of foods alone. Similar 
concentration on clothing again emphasizes the enormously 
great number of prices which we might consider. When 
we look carefully into the problem, we see that there is a 
comparatively small number of items to which we can 
restrict ourselves, which will account for the greater 
part of the ordinary necessary purchases in everyday life. 
In order to avoid this wide survey, the Ministry of Labour 


158 ELEMENTARY STATISTICAL METHODS 

in their calculations of the cost of living index number 
actually do restrict themselves to a sample of all the 
possible information, knowing that the sample chosen 
represents to a large degree all ordinary purchases. Thus 
the typical purchases include beef but not salt, tea and 
sugar, potatoes but not other vegetables, bread but not 
cake. It must be remembered also that the information 
sought does not consist of a list of articles purchased, 
but a list of prices of such articles. There are 
thousands of shops in the country which quote these 
prices; and, again, instead of attempting to survey 
the whole country, a sample is taken, first of all by 
choosing a number of towns and then by choosing a number 
of shops in these towns. The cost of living index number 
inquiry is essentially an inquiry by sample. 

Similarly, in the construction of the Statist and the Board 
of Trade Wholesale Price Indices, a sample only of all the 
possible information is taken ; in the former case forty-five 
items are included, in the latter case 150 items.^ So in the 
case of the Index of Production issued by the London and 
Cambridge Economic Service. In this case, however, the 
problem is not one of sampling existing infomation but 
of taking account of as much information as is available. 
We can use data referring to output of coal because th^ 
are known, we caimot use similar figures for production 
of suits of clothes because they are not available. We know 
the output of pig-iron, but we do not know the output of 

On the other hand, the indices published by the Board 
of Trade showing relative changes in value and volume of 

* Actually, there are somewhat more th^ these, as some of the 
•• items " are already averages of a few quotatrons. 



INDEX NUMBERS 


159 


the import and export trade do take account, as far as is 
possible, of all the available information. 

Where a sample is used in the calculation of an index 
number, the sample is expected to be so representative, 
that the final index arrived at will reasonably show 
the changes with time in the factor which is being 
measured. 

We have then a certain number of items in the 
form of percentages, which are to be averaged, and 
we come to the third point, the question of weights 
to be attached to these items. Palpably, weighting is 
necessary^ because the items compose a heterogeneous 
group. If we are dealing with a retail food price index, 
for instance, and we have items 115, 125, 132, 116, 124, etc., 
which have been calculated from prices, at two dates, of beef, 
milk, bacon, bread, eggs, etc., we must realize the necessity 
for weights to indicate the relative degree of importance 
of these items, i.e. to indicate the relative degree of 
importance of these articles, beef, milk, bacon, bread, 
6ggs, etc. The question is : what considerations are to guide 
us in assessing the relative importance of these com- 
modities ? It is reasonable that we should assume that this 
should be decided by the consumers of them. The Ministry 
of Labour are guided in this connection by the tastes of 
working class households, and the importance of these 
articles is determined by the relative amounts spent on 
them in such households. If, for instance, the available 
money (say los.) is spent only on five articles and 
distributed in this way : Beef 3s., milk 2s., bacon 2s., 


^ Pnori, that is ; unless vve have reason to believe, on the principle 

weighted average and the unweighted 
average would not differ significantly. ° 


i6o ELEMENTARY STATISTICAL METHODS 

bread 2 S., eggs is., then we may assume weights pro- 
portional to these, viz. : Beef 30, milk 20, bacon 20. 
bread 20, eggs 10, total 100. We determine the relative 
importance of the articles by the way the consumers spend 
their money on them. If three times as much is spent 
on beef as on eggs, then we assume that the relative 
importance of beef to eggs is three to one. The Mmistry 
of Labour use as weights, in the cost of living index number, 
figures determined by many typical working class budpts. 
In these budgets the amounts spent on different articles 
of food are taken into account, also amounts spent on rent 
and rates, clothing, fuel and light, and so on, and the 
average experience is used as a guide to the relative 
importance of the different items included m the mdex 
number computations- 

SimUar considerations arise when we are dealing mth 
a number of items expressing the relative output of certain 
industries or trades in a given period, compared with a 
base period. The average of these items is to be 
an index number of production. What method of weighting 
shall we adopt to indicate the relative importance of the 
different items, i.e. what figures wiU properly the 

relative importance of different industries ? Are we to 
use man-power, horse-power of machines tJsed, v 
product, or value of net output ? Obviously, ‘he medi^ 
^f comparison is important. We must use 
are proportional to figures measured m the same urn . 
The^quLtion is. what is the best unit lor compaiKon 



INDEX NUMBERS 


i6i 

cost of materials used. There are objections to using man- 
power and horse-power of machines. Mere numbers 
employed is no certain guide to the relative position of 
a particular industry in the national economy ; if we are 
dealing with numbers we should have to qualify them with 
figures expressive of the relative powers and skill of the 
persons included, which would be difficult of achievement. 
Also, horse-power of machines is no guide, because some 
industries are more mechanically equipped than others, 
merely because of their particular nature, and out of all 
relation to the place these industries occupy in any scale 
of importance. Gross output, the total value of the 
product, is not a satisfactory guide, because the gross output 
of certain industries is high, since they are producing 
finished goods from nearly finished goods which are costly 
to those industries. The gross output of a firm manu- 
facturing steel girders may be higher than that of a firm 
manufacturing pig-iron, but one would not for that reason 
assume that the production of pig-iron was not as important 
as that of steel girders. The net outputs of different 
industries are acknowledged to give the surest guide to 
their relative importance. In the two illustrations cited 
above the weights used in the calculation of the weighted 
average, which is the index number, are values. In 
many index number calculations the only possible medium 
of comparison of the units which form the heterogeneous 
group is that of value, measured in money. We can only 
compare tons of pig-iron, bushels of wheat, gallons of 
milk, thousands of barrels of beer, if we express each in 
terms of money value. 

In certain cases no weights are used, e.g. in the Statist 
Index Number of Wholesale Prices and in the similar 


ELEMENTARY STATISTICAL METHODS 


index calculated by the Board of Trade. Here each item 
entering into the group is considered of the same degree 
of importance as the others, no attempt being made to 
assess them comparatively. The items which are averaged 
are relative prices between the later period and the base 
period of a number of raw materials and partly 
manufactured goods, such as raw cotton, pig-iron, coal, 
cotton yams. It is hard to decide from what point of view 
we should attempt to determine the relative importance 
of these commodities. We saw above that the ordinary 
simple average would not differ greatly from the weighted 
average so long as the weights were unrelated to the sizes 
of the items. If this is the case here, then no appreciable 
error is introduced into the index number by taking this 


as an unweighted average. 

Finally we have to consider what kind of averaging 
process is to be used in the calculation ol the index number. 
We discussed earUer the use of the average and the median 
as representative figures of a group. The geometnc 
mean of a group of numbers is also sometimes used as a 
representative. If the group consis ts of n valug 

X the geometric mean is Vxi*, ...*«• “ 

usually computed by taking the simple average of to 
logarithms of the numbers. Thus the loganthm of to 
geometric mean is 


- (log -f log Xj -b . . • log ^n) 

If the original data are given in the usual tabular form m 
grades the calculation of the geometric mean is 
W. when the original values of the item to be a^ra^am 

known, then little more ^“f^eightrf 

necessary for the simple average. Just as g 


INDEX NUMBERS 


1(^3 


average can be calculated, so can a weighted geometric 
mean or a weighted median be obtained. The latter is 
not often used ; to obtain the former, we get first the 
weighted average of the logarithms of the items. If the 
items are /j, . . . /„ and the weights are W^, W^, 
. . . Wn, then the logarithm of the weighted geometric 
mean is 

log + 1^2 log /a + . . . log /„ 

W,+W,+ ...W„ 

The geometric mean is of value when we are considering 
items in a group from the point of view of their relative 
differences rather than from the point of view of absolute 
differences, and is therefore reasonably used in index 
number computations, where the items averaged are 
themselves percentages. The geometric mean is also the 
more appropriate where the moving or chain base method 
is used, and certain theoretical considerations dealt with 
below support the claims of the geometric average. These 
considerations are concerned with the general problem of 
whether the index number performs satisfactorily its 
function of indicating the required change. 

If we are dealing with a group of items, which show an 
average increase of (say) 25 per cent from one period to 
another, then the change can also be stated as a decrease 
of 20 per cent from the later period to the first period. 
The averaging process should show this. Suppose we take 
a simple illustration with two items. 


Amount in 
Year (1) Year (2) 
A 40 46 

B 60 84 


Year (2) 


Year (1) 
1150 
140-0 


X 100 


Year (1) 
Year (2) 


X 100 


86-9 

71-4 


Simple average 
Geometric meao 


127-5 

126-9 


79-15 

78-8 


164 ELEMENTARY STATISTICAL METHODS 

Using the arithmetic average, we find that the year 
(2)’s figures are higher than those of year (i) by 27-5 per 
cent and therefore year (i)’s figures are lower than year (2)’s 
by 21 • 55 per cent of the latter. Actually from the index 
munber we get 20-85 as this figure, (100 —79-15). The 
geometric mean shows year {2) as higher than year (i) by 
26-9 per cent, which is equivalent to sa5dng that year (i) 
is lower than year (2) by 21-2 per cent, agreeing with 
the index calculated for year (i) on year (2) as base. 

100 , 100 

( X 100 is not the same as 79*15, but — ~ — x 100 

127-5 126-9 

is equal to 78-8.) From the point of view of this test ot 

the efficiency of an index number, the geometric mean 

is the more satisfactory, but of course in practice the 

difference between the results obtained by the two methods 

may be slight, and if the ordinary average method does 

not satisfy this test exactly, it may be held to do so to a 

sufficient degree of accuracy to warrant the method being 

used in the computation of these index numbers. 

A geometric average is used in the calculation of the 
Board of Trade Wholesale Price Index Number, but a 
simple average is used in the calculation of that of the 
Statist. 

It is interesting to compare the three following index 
numbers: The Board of Trade Wholesale Price Index 
Number, the Statist Wholesale Price Index Number, and 
the index number calculated by the Board of Trade showing 
the changes in average values of retained imports. The 
first two are different attempts to measure changes in 
wholesale prices, 150 items are used in the Board of 
Trade computations, forty-five items in those of the 
Statist. No weights are used, but the Board of Trade 


INDEX NUMBERS it>5 

uses a geometric mean and the Statist a simple average. 
The last is calculated from the average values of retained 
imports by means of a weighted average. As many of 
the available data as possible enter into the calculations 
and the weights are the values of the different descriptions 
of retained imports entering in 1924. Thus all these three 
are different in structure, yet as the foUowing table shows, 
they tell nearly the same tale over a period of time. 



Wholesale Price 

Board of Trade — 


Index Number 

Average 

Year 

Board of 
Trade 

Statist ^ 

value of 
Retained 
Imports 

1924 

100 

100 

100 

1925 

95*8 

98 

98-8 

1926 

891 

90 

90-4 

1927 

85-2 

88 

86-6 

1928 

84-4 

86 

87-7 

1929 

82- 1 

83 

85-7 

1930 

71-9 

69 

75-6 

1931 

62-6 

59 

61-2 


Tbe agreement between these three series shows that 
approximately the same results are achieved, even though 
the sampling and the arithmetical processes differ. 

There is one method of calculation of an index number 
which can be described in a somewhat different manner. 
It is that case where a weighted average is used for the 
calculation of a price index. Here the items are relative 
prices in the later period to those of the earlier period. 
The weights are the values of the articles for which we 
have price quotations, (or figures proportional to them), 
these values being expenditure on the articles by the 

^ The Statist figures have been reworked on a 1924 base. 

The other figures are taken from the Journal of the Royal Statistical 
Society, 1932, p. 614. j j j 


1 66 ELEMENTARY STATISTICAL METHODS 


average individual consumer, or the total amount expended 
by aU consumers. We may set out the method of 
calculation as follows 



Price 




Article 

Period 

Period 

Relative Price x 100 

Weights 

(Values) 

Products 


(1) 

(2) 




A 

P(a)i 

^(«)i 

p{a)t/p{<^)i X 100 = /(a) 

V(a) 

m X V(a) 

B 

Pib)x 

P{b)» 

p{byp(b)^ X 100 = m 

V{b) 

m X V{b) 

C 


p{c)t 

P{c)t/P(c)t X 100 = /(c) 

V(c) 

m X v(c) 

D 

p(dh 

PWt 

PWP{<i)i X 100 = 1(d) 

• • 

V(d) 

m X v(d) 

• 

• 

• 

1 

• 

' 


the index number is the weighted average : — 

I{a)V{a) +/(6)F(6) + . . . 

V{a) + 7(6) + . . . 

Now suppose that we have decided that the appropriate 
weights are the values in the first period (there may be a 
different distribution of values in the second period), 
these values are those of actual quantities consumed, 
and quantity x price is equal to value. If we refer 
to the quantities as q{a), q{b), q{c), . . . then we have 
q{a) X p{a) = 7(a), q{h) X p[h) = 7(6). and so on, or 

for the period (i), q{(^\ X P{(^\ = V{^xt 

Taking the first period values as weights, the index number 

may be written : — 

X 100 X V(«). +^* X 100 X K(6), + . . . 

p(a\ P(l>)i 

V(a)t + K(6). + . . . 

which is equivalent to 

/> (a), X q{a)i + P{b)t X ^(6)^ + ■ • • ^ 
p{a)i X q{a)i +^(6)i x ^(6)i + . . . 


INDEX NUMBERS 


Now in the numerator, ^{a)i X q{a)\. may be considered as 
the amount of money which would be necessary to purchase 
the quantity of the article A at the later period’s price, 


Article 

Quanti- 

ties 

Pur- 

chased 


Pri 

Joly. 

1914 

ice 

1st 

Jan., 

1929 

Expcn 

JolyJ 

1914 

diture 

1st 

Jan., 

1929 




s. d. 

s. d. 

d. 

d. 

Beef, British : 

Ribs 

lib. 

per lb. 

10 

1 41 

10-0 

16-8 

Thin Flank 

11b. 


6* 

91 

6-5 

9-3 

Beef, Chilled or 







Frozen: 

Ribs 

1-35 lb. 

9$ 

n 

101 

9-8 

13-9 

Thin Flank 

1-35 lb. 

•9 

41 

51 

6-4 

7-4 

Mutton, British : 







Legs 

0-48 lb. 

99 

101 

1 6 

5-0 

8*6 

Breast 

0-48 lb. 

99 


10 

3-1 

4*8 

Mutton, Frozen : 







Legs 

0-75 lb. 

99 

61 


5-1 

8-8 

Breast 

0-75 lb. 

99 

4 

5 

3-0 

3-8 

Bacon, Streaky . 

1-1 lb. 

99 

111 

1 4 

12-4 

17*6 

Flour . 

9-0 lb. 

per 7 lb. 

101 

1 31 

13-5 

19-9 

Bread 

23-5 lb. 

per 4 lb. 

51 

81 

33-8 

49*9 

Tea ... 

0-8 lb. 

per lb. 

1 61 

2 41 

14-6 

22-8 

Sugar, Granulated 

6-1 lb. 

99 

2 

3 

12-2 

18-3 

Milk . 

4-7 qts. 

per qt. 

31 

61 

16*5 

30-6 

Butter^ Fresh 

0-95 lb. 

per lb. 

1 21 

2 1 

13-8 

23*7 

„ Salt 

0-98 lb. 

99 

1 21 

1 111 

14*5 

23*0 

Cheese 

0-8 lb. 

99 

81 

1 3 

7-1 

12-0 

Margarine . 

0-9 lb. 

99 

7 

71 

6-3 

6-8 

Eggs, Fresh 

10 

each 

11 

21 

12-5 

27*5 

Potatoes 

171b. 

per 71b. 

41 

61 

11-6 

15-8 

Fish . 

— 

— 



6-1 

13-3 






223-8 

354-6 


354‘6 _ 159 
223-8 “ 100 

The Index number is 159. 

(The " Quantities purchased " figures in the above table are taken 
from Prices and Wages in the United Kingdom, 1914-1920. by Professoi 
A. L. Bowley.) 


168 ELEMENTARY STATISTICAL METHODS 

p{h)t X ^(6)i may similarly be considered as the amount 
necessary for the quantity of this article at its later period’s 
price, and so with the other articles and their contribution 
to the numerator. 

Thus the numerator may be reckoned as the total value 
of the first period’s consumption revalued at the later 
period’s prices. The denominator is, of course, the total 
value of the first period’s consumption at prices then 
current. In practice we may have available the quantities 
consumed, and it may be a simpler procedure to revalue 
them at the later period’s prices, rather than to work out 
the price ratios and compute the index number from the 
original formula. For instance, in the Ministry of Labour 
Retail Food Price Index, we can use the expenditure in 
a typical working-class budget on different articles, as 
weights attached to the relative prices, or we can revalue 
the quantities purchased at the later period’s prices, and 
we shall get the same result either way. (Table on p. 167.) 

This method is used by the Board of Trade in the 
calculation of the change in average values of retained 
imports, the figures quoted on p. 165. The quantities 
imported in the base year, {1924) are revalued at the average 
values of the later year (average value of a commodity is 
obtained by dividing the total value by the quantity). 
The total resulting from this revaluation is expressed as 
a percentage of the total value in the base year, giving 
the inde.x number. 

The price index number is sometimes expressed briefly 
in the form 



, where the summation refers to 


Sum (Fi) 


109 


INDEX NUMBERS 

the different articles or commodities, and in its new form as 

Sum (^Wi) 

X 100. 

Sum 

The Board of Trade also calculate an index number showing 
changes in volume of imports and exports. In this case 
the items to be averaged are q^lqi x loo, the relative 
quantities in the later period to those in the earlier period. 
In the averaging process the weights used are again the 
values of the different commodities imported or exported, 
and the index number is written briefly as 

Sum ^ X 100 X 

Sum (KJ 



This may again be rewritten in the form 


Sum {q^^ 
Sum 


X 100. 


Here the numerator consists of the sum of the results of 
revaluing the quantities of the later period at the average 
values of the earlier period. The denominator is the total 
value of the first period’s quantities at the prices then 
current. 

This method of stating the index number computation 
has the merit of putting it in a form which conveys some- 
thing more definite to a person trying to appreciate the 
meaning of an index number, than is imderstood by a mere 
abstraction like a weighted average. But it must be 
remembered that the basic idea is that we are averaging 
certain percentages, and it may be consider^ as a fortunate 
“ accidoit ” that we are able to translate this into some- 
what simpler terms in these cases mentioned. The 


APPENDIX 


The Cost of Living Index Number 

Reference is made to a pamphlet issued by the Ministry 
of Labour, “ The Cost of Living Index Number : Method 
of Compilation.” 

” The statistics prepared by the Ministry of Labour are 
designed to measure the average increase in the cost of 
maintaining unchanged the pre-War standard of living 
of the working classes.” 

The foodstuffs included account for about 75 per cent 
of working class expenditure on food. Retail prices at the 
beginning of each month, of these, are obtained by the 
managers of Employment Exchanges from retailers in 
their localities engaged in a working-class trade. The 
information is collected in all towns with a population of 
50,000 or more at the Census of 1911, and in 420 smaller 
towns and villages throughout the country. The total 
number of retailers is over 5,000. 

The weights used are based on the average expenditure 
of 1,944 urban working class families, this information 
having been collected by the Board of Trade in 1904. 
The use of figures relating to 1904 instead of 1914 is 
considered reasonable on the grounds that between 1904 
and 1914 no great change took place in the standard of 
living. The expenditure on margarine did change, and 
this item had special treatment. The weights given in 
the Dublication referred to above are : — 


INDEX NUMBERS 


J7I 


Beef 

48 

Sugar 

19 

Mutton . 

24 

Milk 

25 

Bacon 

19 

Butter 

41 

Fish 

9 

Cheese 

10 

Flour 

20 

Margarine 

10 

Bread 

. 50 

Eggs 

19 

Tea 

. 22 

Potatoes 

18 


Total . . 334 


The weighted average increase in the relative prices of 
foodstuffs is combined with similar figures indicating 
changes in rents, clothing, fuel, and light, and other items. 

As regards rents, which includes rates and water rates, 
the information collected relates to that concerning 
controlled and decontrolled rents. Inquiries have been 
made respecting thirty-nine large towns on the subject of 
controlled rents, and from twenty-nine large towns about 
the proportion of working-class dwellings decontrolled, and 
the subsequent increase in rent. 

As regards clothing, information is obtained as to 
changes in retail prices of men’s suits and overcoats (ready- 
made and bespoke), woollen and cotton materials, under- 
clothing and hosiery, and boots. Inquiry forms are 
completed each month by 300 outfitters, drapers, and boot 
retailers in eighty-one towns. 

In the fuel and light group are included coal, gas, oil, 
candles, and matches. Information respecting prices of 
coal is obtained from twenty-nine principal towns, prices 
of gas are obtained from twenty-four towns, and similar 
details about lamp oil, candles and matches refer to forty- 
nine towns and ninety-one towns respectively. In the 
necessary aggregation of this information a weight of 6 
is given to the relative increase in the price of coal, a 
weight of 3 to that of gas, and a weight of 0*7 is 
allocated to the change of price of oil, candles, and matches 


172 ELEMENTARY STATISTICAL METHODS 

taken together. These weights are determined by the 
relative expenditure on these items in the pre-War budgets 
already mentioned. 

Amongst other items are soap and soda ; domestic 
ironmongery, brushware and pottery, tobacco and 
cigarettes, fares ; and newspapers. Prices of soap and 
soda are obtained from ninety-one towns, those of the 
next group are from forty towns. As to the rest the 
tobacco manufacturers retail price list, the principal 
transport undertakings, and the Ministry of Transport, 
and the daily Press supply the necessary information. 

The combination of all this information into a single 
figure is performed by weighting. The weights used are 
food rent 2, clothing i^, fuel and light i, miscellaneous |, 
total 12^. The budgets collected in 1904 showed that 
on the average about 60 per cent of the total expenditure 
was on food. As regards rents, a cost of Living inquiry in 
1912 showed that roughly one-sixth of working class 
expenditure was on rent, thus a weight of 16 out of a total 
of 100 is adopted. In the pre-War investigations the 
average expenditure on clothing was less than that on 
rent and, although there is wide variation from one 
household to another in this respect, a weight of 12 out 
of 100 is taken for this item. There are no extensive 
statistical data on the other items in the index number, 
but the available information suggests weights of 8 and 4 
respectively out of a total of 100. 



Chapter ii 


GRAPHS OF TIME SERIES 

Diagrammatic representations of time series are often 
used to enable changes occurring to be easily appreciated. 
The normal procedure is to represent time along a 
horizontal scale and the quantity tabulated on a vertical 
scale. There are two kinds of time series which we have to 
consider. The first is a series of figures relating to some 
quantity which is measured at particular instants of time. 
Of this type are census of population figures, which give 
the population existing in a country on a given day in a 
particular year, or the estimates made annually by the 
Ministry of Labour of the total number of insured workers 
in July. The second type is a series of figures giving the 
aggregate experience in a number of time intervals, of some 
particular quantity. Of this type are the import and 
export figures giving totals for a series of months or years, 
or figures of total output of coal weekly. 

When we use a horizontal line in a diagram to represent 
time, we are quite rightly allowing lengths on the line to 
correspond to intervals in time. Thus if we take i inch 
to represent an interval of a year, ^ inch represents an 
interval of six months. 

^ I I I 

1924 1925 1926 

The mark on the scale separating one year from the next 
corresponds in time to midnight of New Year’s Eve. 


173 


1924 


1926 


192 ^ 


cond kind should appear in this form 


Ordinates are erected at these points corresponding in 
length to the particular statistics which are being graphed, 
whether these figures relate to an instant of time or are 
aggregates over a period. Thus the same kind of graph 
would be used to show figures giving the mid-year 
estimates of population prepared by the Registrar-General, 
as for figures giving the output of coal during the year. 
The description accompanying the diagram would state 
clearly exactly what statistics are being graphically 
represented. 

In this kind of diagram the space left between successive 
points on the horizontal scale is of a convenient size, 
arranged so that a person can read the diagram with ease. 
The points should not be too close together, otherwise the 
picture would show confusion, nor should they be too far 
apart, otherwise the diagram would be so large that the 
eye would undergo strain in attempting to absorb the 
details of a picture painted on a large canvas. The space 
between successive points naturally cannot correspond 
to a time interval since time intervals are being represented 
by p)oints. On the other hand, if we had a diagram 
where (say) census populations were being shown, and 



PopaJaUon 
m Mi/iions 



Diagram 12. Population of England and Wales at each Census, 

1821-1931. 


177 


graphs of time series 

straight lines. Usually there is no other significance to 
these straight lines than this, but sometimes we may be 
justified in inferring from the diagram an intermediate 
value of the variable for a period for which we had no 
data. Thus, suppose the census populations were plotted 
in this way, we might suppose that a point half-way along 

MiHka 

Tom 



the line connecting the tops of the ordinates corresponding 
to census populations of i86i and 1871 would correspond 
to the population of 1866 on the assumption that the 
increase from 1861 to 1866 equalled that from 1866 
to 1871. 

Diagrams 12 and 13 illustrate this method of plotting 
graphs. 


178 


ELEMENTARY STATISTICAL METHODS 

Sometimes the time interval is represented by a space 
interval along a horizontal scale and the ordinates, whether 
representing a variable which corresponds to an instant 
of time or to a period of time, are erected at a point in 
the middle of the space corresponding to the unit time 
interval. 



Diagram 14. Number of Insured Workers. (July each year.) 


Diagrams 14 and 15 illustrate this method of graphical 
representation. 

It wUI readily be appreciated that, when we are plotting 
a graph showing large numbers, not much precision is 
possible. Ill Diagram 13, for instance, the points 
corresponding to the output of coal in 1919 19^0 are 

at the same distance from the horizontal line, indicating 
that the output was the same in the two periods. Actually 


179 


GRAPHS OF TIME SERIES 

these were 229,780,000 tons and 229,532,000 tons 
respectively. In such diagrams we must be satisfied that 
the gain, obtained by having a graph which enables us 
easily to trace the changes which take place, compensates 


fjwUion 
/AOO^ 

f300 

teoo 

i/00 
iOOO^ 
900 
600 
700- 
600- 
500- 
^0- 
300- 
BOO- 
iOO- 
0 



J L 


_L 


_L 




J 1 L 1 


1925 1320 (927 (920 (929 (330 ('>31 (932 

Diagram 15. Value of Imports into United Kingdom. 


for the loss suffered by replacing precise figures by a graph 
which cannot pretend to delicate precision at all. 

When we are plotting the points on the graph corre- 
sponding to the values of the variable quantity we bear 
in mind two things : (i) the fact that we wish to have 
a graphical representation corresponding exactly to the 



I So ELEMENTARY STATISTICAL METHODS 


original data, (2) the possibility that what we are really 
interested in is the extent of the changes which are taking 
place in our variable. In (12) and (13) we have graphical 
representations corresponding exactly to the original 
data. Sometimes we may be more interested in the changes 
which take place, rather than in the actual size of the 
figures involved. If this is the case, we may need a larger 



Diagr.\m 16. Number of Coal-winding Days per week. 


vertical scale than is consistent with the size of the paper 
on which the diagram is being made, and we therefore 
sacrifice a part of the diagram by ehmination. For instance, 
suppose we wished to show graphically the following 
figures : — 


.\VER.\GE Weekly Number of D.\ys in which Coal was Wound in 
o.vE Fortnight of each Month of 1913 


an Feb. Mar. .Apr. May June July .Aug. Sept. Oct. Nov. 
64 5 61 5-67 5 69 5-64 5-44 5-26 5-54 5-60 5-59 5-56 


GRAPHS OF TIME SERIES i«i 

These are shown in Diagram i6. plotted in the usual 
way, and in Diagram 17 on a larger scale to emphasize 
the changes which take place from month to month, but 
as this scale would involve the zero of the vertical scale, 
with the horizontal scale, placed some considerable distance 
from the graph, we eliminate that part, and indicate this 
gap as in the diagram. Quite often, however, this indica- 



T- f . I 1 > ' » r— ■ ■ T 

J F M A M J J A S O N D 

Diagram 17. Number of Coal-winding Days per week 


tion of the gap is not shown in the diagram at all, the only 
means of noting it is by observing that the zero on the 
vertical scale does not coincide with the horizontal base 
line on which the time element is shown. 

This method of reducing the space occupied by the 
diagram, when a large scale is used for the variable which 
is being plotted, is very usual, especially where expense 



1 82 ELEMENTARY STATISTICAL METHODS 

of printing enters into consideration, but when it is adopted 
care must be taken in reading the diagram. There is no 
doubt that the most important part of the diagram to 
impress itself on a person reading it is the graph itself. 
He may neglect to observe the vertical scale, or he may 
only take note of that after he has had the changes shown 
by the graph firmly impressed on his mind. This first 
impression may linger even though it may be somewhat 
modified by a sight of the scale. It is quite conceivable 



Diagram 18. Birth-rate. England and Wales. (Births per thousand 

of population.) 

that the total impression received from such a diagram 
may be substantially different from that obtained if the 
diagram had been differently constructed. Thus 
Diagram i8, showing changes in the birth-rate since 1905, 
indicates a considerable decline in the last twenty or thirty 
years, interrupted by the War years and those immediately 
succeeding. This decline, it is true, has been considerable, 
but is not as great as is suggested on a first reading of this 
diagram. If the figures had been plotted on a different 



1911 iZ iS 15 /6 r7 18 (9 20 ai 2a 23 25 


Diagram 19. Value of Imports of Merchandise at Hull, Bristol, 

and Newcastle. 



1 84 ELEMENTARY STATISTICAL METHODS 

Ihousand 

Acres 


2/00 


/900 


/BOO 


1700 


/600 


1500 


/4O0 


1300 


/200 


ilOO 


fOOO 



Sugar Beet 


O 



185 


GRAPHS OF TIME SERIES 

a simple series there are two aspects to be considered in 
graphical representation, (i) the graph is to show properly 
the actual sizes of the numbers in the series, (2) the graph 
is to show properly changes in these, so that comparisons 
c?n be made. If we have two series in a diagram, there are 
six aspects to be considered, for each series the sizes of the 
nmnbers and the changes in them, and between the series 
a comparison of the sizes of the nmnbers and a comparison 
of the changes. So if we have three series to be plotted. 


Output of Certain Minerals in Great Britain 
(in Million Tons) 



Coal 

Iron Ore 
and 

Iron Stone 

Tin Ore, 
dressed 

1920 

230 

13 



1921 

163 

3 

— 

1922 

250 

7 

— 

1923 

276 

11 : 

— 

1924 

267 

11 

— 

1925 

243 

10 

— 

1926 

126 

4 

— 

1927 

251 

11 

— 

1928 

237 

11 

— 


there are twelve points of view to be borne in mind. It is 
quite conceivable that a diagram may satisfactorily 
succeed in showing the series from some aspects but may 
fail in other respects. Diagram 19 showing value of 
imports into three ports reasonably serves its purpose from 
aU jx)ints of view, but Diagram 20 is on such a scale that 
changes in acreage of beet crops are hardly discernible. 
On the other hand this diagram adequately represents 
the various acreages of these different crops. It would 
be necessary, if we wished to give a better graphical 



iS6 ELEMENTARY STATISTICAL METHODS 

representation of the acreage under beet, to show this 
separately in another diagram. This kind of breakdown 
is of course no different from a similar breakdown in 
certain tables. For instance, we might represent the facts 
relating to output of certain minerals in Great Britain in 
the table on p. 185. 

The output of tin ore is small, and does not run into 
millions of tons, the actual details are : — 


Output of Certain Minerals in Great Britain (Tons) 


1 j 

1 Coal 1 

1 1 

Iron Ore and 
Iron Stone ! 

Tin Ore, 
dressed 

1920 i 

229,532,081 

12.677.670 ; 

4,858 

1921 j 

163,251,181 i 

3.470,516 ' 

1.078 

1922 1 

249,606,864 ! 

6,836,507 ! 

! 650 

1923 1 

276,000,560 ! 

10.875.211 ; 

1 1,760 

1924 1 

267,118.167 1 

! 11,050,589 1 

3,547 

1925 

243,176,231 j 

i 10,142.878 

4.032 

1926 i 

j 126,278,521 

4.094,386 

3,878 

1927 1 

251,232,336 

11,206,601 

4,321 

1928 ’ 

237,471,931 

1 

11,262,323 

4,844 


Actually, of course, if we wished to show these figures in 
round numbers, we should make a table : — 

Output of Certain Minerals in Great Britain 



Coal 

(mn. tons) 

Iron Ore and | 
Iron Stone I 
(mn. tons) I 

Tin Ore 
(thousand 
tons) 

1920 ! 

230 

I 12’7 

4-9 

1921 i 

163 

1 3*5 

11 

1922 

250 

6-8 

0*7 

1923 i 

276 

! 10-9 

1*8 

1924 

267 

; 111 

3*5 

1925 

243 

I 10-1 

4*0 

1926 

126 

1 4-1 

3 9 

1927 

251 

1 11-2 

4*3 

1928 

237 

i 11-3 

4*8 



GRAPHS OF TIME SERIES 


187 

This change in the degree of precision of the round 
numbers in the table corresponds, of course, to change of 
scale in a diagram. 

When many time series are represented graphically on 
the same diagram, some confusion may arise because the 
graphs may be nearly superimposed on one another, or 
because they may cross and recross. Even though we 
carefully differentiate between them by means of broken 
and dotted lines or with different coloured inks, it may 


Quantitiks of Wheat (Grain) Consigned from Certain CoxmrRiES 
TO THE United Kingdom 


(Million Cwts.) 



Russia 

Argentine 

U.S.A. 

Canada 

British 

India 

1905 

25-6 

23-3 

6-5 

6-6 

22*8 

1906 

161 

19-2 

22-6 

11*2 

12*6 

1907 

11-4 

21*9 

19-9 

13-2 

18*3 

1908 

51 

31-7 

25-8 

15-8 

2*9 

1909 

17-8 

200 

15-5 

16-6 

14*6 

1910 

28-9 

151 

10-9 

16-4 

17*9 

1911 

181 

14*7 

12*9 

14*4 

20*2 

1912 

90 

18-8 

200 

21-6 

25*4 

1913 

50 

14'8 

341 

21*8 

18*9 


Australia 


101 

7- 9 

8- 3 
5-5 

9- 7 
131 
13*9 
11-9 
10- 1 


be difficult for a person to appreciate the facts which it 
is the purpose of the diagram to convey. There is a point 
at which the diagram of this kind becomes so confusing 
that it serves no useful purpose at all. Any one proposing 
to understand it may quite likely give up the attempt 
after a first glance at the diagram. Obviously such a 
diagram is of no use. A guiding principle to remember in 
the construction of diagrams is that they are to serve to 
help others to appreciate the facts contained in certain 
tables, the idea being that most persons can more readily 


1 88 ELEMENTARY STATISTICAL METHODS 

appreciate these when they are presented in diagrammatic 
form. 


Diagram 21 shows the figures of the table on p. 187. 



GRAPHS OF THIE series 189 

It is questionable wheth^ sudi a diagram is a real 
help to anyone interested in appreciating this table. 

We have now to consider the diagrammatic representa- 
tion of two or more series of figures which involve different 
units, so that more than one vertical scale is necessary in 
the diagram. Difficulties arise here because the impression 
conveyed by the diagram depends upon the scales used, 
and these may be determined merely by considerations 
suggested on account of the necessity of obtaining a 
diagram free finom obvious defects such as confusion of the 
graphs in it. Considerations of this kind mig^t result in a 
numb^ of methods of plotting the figures, which would 
lead to diagrams of different appearance. 

Siq)pose we wished to make a diagram to show these 
figures: — 


Gbbat Britain: Coal Uining Imdostst. Ootpdt and Hak-sbifis 
W oSRS>. QuaSTBKLY FkGOKBS 19^ to 1924 



Qaartezs 

Output 

(nuLtons) 

B{an*filiifts 

^rarked 

(mns.) 

On^ut per 
Bfw-shift 
(cwts.) 

1922 

1 

57 >6 

63>2 

18-23 


2 

53*3 

59-8 

17-80 


3 

58-7 

65-4 

17-94 


4 

64*5 

71-3 

18-10 

1923 

1 

«?•! 

73*5 

18-25 


2 

65<5 

73-2 

17-90 


3 

62.0 

71-2 

17-42 


4 

67-8 

76*4 

17-76 

1924 

1 

67-0 

75-4 

17-79 


2 

61 >6 

70-4 

17-48 

*■ 

3 

59*2 

68-3 

17-33 


4 

62*4 

70-4 

17-74 


DB^ams 22 and 23 are only two diSeiait methods of 
paraenting these three series. From Diagram 22 we 


190 


ELEMENTARY STATISTICAL METHODS 


should get the impression that ouput per man-shift hardly 
changed at all during this period, and that the changes 
in the other two series were nearly of the same extent. 



Gi«at Britain. Coal Mining 



GRAPHS OF TIME SERIES 191 



192 ELEMENTARY STATISTICAL METHODS 

From Diagram 23 we should be impressed by the fact 
that changes of a regular character exist in all these 
series, and that those in output per man-shift were of the 
same degree as those in the number of man-shifts worked. 
So long as the guiding principle of making a diagram hee 
from confusion is observed there are no rules which can 
be laid down for the construction of such diagrams in 
which different units are involved in the series. 

CoAi. Mining Industry : Output and Man-shifts Worked 


(Relative figures) 




Output 

(«) 

Man- 

shifts 

worked 

Output 

per 

Man- 

shift 

Output 

(6) 

Man- 

shifts 

worked 

Ou^ut 

per 

Man- 

shift 

1922 

1 

100 

100 

100 

84-9 

82-7 

102-6 


2 

92-5 

94-6 

97-6 

78-6 

78-2 

100-2 


3 

101-9 

103-5 

98-4 

86-6 

85-6 

101-0 


4 

112-0 

112-8 

99-3 

95-1 

93-3 

100-9 

1923 

1 

116-5 

116-3 

100-1 

99-0 

96-2 

102-8 


2 

113-7 

115-8 

98-2 

96-6 

95-8 

100-8 


3 

107-6 

112-7 

95-6 

91-4 

93-2 

98-1 


4 

117-7 

120-9 

97-4 

100 

100 

100 

1924 

1 

116-3 

119-3 

97-6 

98-8 

98-7 

100-2 


2 

106-9 

111-4 

95-9 

90-9 

92-1 

98-4 


3 

102-8 

108-1 

95-1 

87-3 

89^ 

97-6 


4 

108-3 

111-4 

97-3 

92-0 

92-1 

OCLQ 


One of the simplest methods of avoiding this difficulty 
of choice of scales is to sacrifice the units involved and 
resort to percentages. This means, of course, that the 
actual sizes of the figures in the series are not reproduced 
graphically at all, that only the relative figures are shown 
in the graphs, and we therefore do not pretend that the 
diagram is making any attempt to show the original 
table properly. But even here we have an embarrassing 


GRAPHS OF TIME SERIES 

choice to make. Shall we take each figure in a series as a 
percentage of the first figure in the series, or the last 
figure, or the maximum figure, or the minimum, or tlie 



4 


194 


ELEMENTARY STATISTICAL METHODS 


average of the series ? Each of these different methods 
would be useful and appropriate on different occasions, 
but whichever we use, we have now the advantage that 
we have eliminated altogether the question of scales for 



195 


GRAPHS OF TIME SERIES 

the graphs, because now all our series no longer involve 
units. Let us see the result of taking each series in the 
last table as (a) percentages of the first figure in the series, 
(b) as percentages of the figure for 1923, fourth quarter, 
which is the maximum figure in each of the first two series 
We «tha11 have to plot the figures in the table on p. 192. 

These are shown in Diagrams 24 and 25. 

The zero on the percentage scale is not shown, as it has 
no particular significance in oxir diagrams, indeed we may 
consider that 100 per cent is the real basic figure in these 
series. This may be emphasized by drawing a horizontal 
line through 100 per cent on the vertical scale. We could, 
of course, superimpose on these diagrams the scales 
corresponding to the original units. We have taken, in 
Diagram 24, 57*6 million tons of output, 63-2 million 
man-shifts, and 18-23 cwt. per man-shift as 100 units in 
each case, thus we have these percentages corresponding 
to certain figures of output, man-shifts and, output per 
man-shift : — 


Output 

Man-shifts 

1 

Output per 
Man-shift 

Mn. 

tons 

Per- 

centage 

Mns. 

Per- 

centage 

Cwts. 

Per- 

centage 

52 

90-3 

60 

1 94-9 

17-3 

94 9 

54 

93*7 

62 

98-1 

17-4 

95-4 

56 

97-2 

64 

101-3 

17-5 

96-0 

58 

100-7 

66 

104-4 

17-6 

96-5 

60 

104-2 

68 

107-6 

17-7 

97-1 

62 

107-6 

70 

110-8 

17*8 

97 6 

64 

lll-l 

72 

114-0 

17-9 

98-2 

66 

114-6 

74 

117-1 

18-0 

98-7 

68 

118-1 

76 

120-3 

18-1 

99-3 





18-2 

99‘8 





18-3 

100-4 


H 



Ratio or Logarithmic Scales 


Another method of dealing with the graphical representa- 
tion of series involving different units is to plot the figures 
on a ratio scale and avoid the trouble occasioned by the 
calculation of the percentages. In effect this means 
departing altogether from the ordinary kind of scale and 
the introduction of a new idea in graphical methods. In 
the usual diagram which has so far been considered the 
same length on the paper in any part of the scale is 
equivalent to the same number of units. Thus if the scale 
is, 100 tons equals i inch, then the interval on the scale 
between 500 and 600 tons is the same (i inch^ as the 
interval on the scale between 200 and 300 tons. In a ratio 
scale this is not the case, the length of the interval between 
two values on the scale is proportional to the ratio between 
these two values. 

Ratio Scai.e 

I I I I I i I t > I I I 

i i i « i 6 7 i 9 10 u 11 

On this scale the distance between i and 2 is the same 
as that between -5 and i or 2 and 4 or 4 and 8. In each 
case the ratio between these pairs is 2 : i. Similarly the 
same distance is observed between i and 3» 3 9 * 4 

12. Consecutive points on the scale corresponding to 
consecutive integers get closer and closer together as we 


197 


GRAPHS OF TIME SERIES 

increase the numbers from i. The simplest method of 
obtaining such a scale is to determine the position of the 
number on the scale from its logarithm. The table of 
logaritlyns below was used in the construction of the scale 
above : — 


Number 

•5 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

Log . 

-•301 

0 

•301 

•477 

•002 

*699 

•778 

•845 

•903 

•054 

l-O 

1041 

1*079 


A logarithm of ♦ i was taken as i cm. in the scale. Thus 
•301 is 3 01 cm., *477 is 4*77 cm., and so on. The actual 
distance between the point marked i on the scale and 
that marked 12 is 10*79 cm. The distance between the 
point marked 6 on the scale and that marked 12 is the 
difference between 7*78 cm. and 10*79 cm., corresponding 
to the difference between log 6 and log 12, this difference 
is, of course, log 2, and the distance between the points 
marked 4 and 8 on the scale, the difference between 6 *02 cm. 
and 9*03 cm. is the same amount. We are using here the 
well-known formula 

Log X - Log Y = Log (A'/F) 

In any ordinary scale the same marks on the scale can 
be taken to correspond to different groups of numbers by 
adding the same amount to each figure on the original 
scale, as below : — 

1 2 3 4 5 6 7 

_J I I I I I I 

14 15 16 17 18 19 20 

Here the same marks on the scale will correspond to the 
group of figures i, 2, 3, 4, . . . or to the group 14, 15, 16, 
17, . . . or to any other group obtained by adding the 
same amount to each of the original figures. So, in a ratio 


igS ELEMENTARY STATISTICAL METHODS 

scale, the same marks on the scale can be taken to 
correspond to different groups of numbers by multiplying 
each figure in the original scale by the same amount 

1 2 3 45e78»10 

• ! ! ! I I I I I I 

3 6 9 1 2 15 18 21 24 27 30 

•4 .8 1-2 l e 2 0 2 4 2-83.23 8 4 0 

Thus the points on the scale corresponding to i to lo may 
also be taken as 3 to 30, or -4 to 4*0, and so on. 

Ratio scale graph paper can therefore easily be made 
or printed, just as ordinary squared paper is made, and 
graphs can be plotted on this kind of paper just as on the 
usual graph paper, except that now the graphs are on a 
ratio or logarithmic scale instead of on an ordinary scale. 
This kind of graph paper is often referred to as “ logarithmic 
paper There is no zero on logarithmic paper, as log 0 is 
an indefinitely large negative number. 

On such paper as this, graphs can be traced in whatever 
units are involved without any dif&culty, and the resulting 
diagrams will show both the actual figures given in the 
tables and at the same time will indicate relative values, 
both between numbers in the same series, and corresponding 
numbers in different series. A change between consecutive 
figures in a series indicated on such a diagram by a certain 
length will correspond to a relative change of the 
corresponding amount. The change from A to B in 
Diagram 26 is the same as that between C and D, in each 
case an increase of 33*3 per cent is recorded. 

If printed logarithmic paper is not available for use 
on any occasion, a logarithmic scale diagram can be 
constructed by plotting the logarithms of the numbers 
in the series instead of the actual numbers, on ordinary 


199 


graphs of time series 

graph paper, and the original units can be shown in the 
/^ia gram at the Same tune. For instance let us take the 
table of figures relating to output and man-shifts worked 



Diagram 26. Logarifhmic Paper. 


200 ELEMENTARY STATISTICAL METHODS 

in the coal mining industry previously quoted on 
p. 189 ; we look up the logarithms of the numbers in the 
table and plot the latter. 


Logarithms 



u 





© 





« 

3 

a 

Output 

Man-shifts 

Output per 
Man-shift 

1922 

1 

1-760 

1-801 

1-261 


2 

1-727 

1-777 

1 250 


3 

1-769 

1-816 

1-254 


4 

1-810 

1-853 

1-258 

1923 

1 

1-827 

1-866 

1-261 


2 

1-816 

1-865 

1-253 


3 

1-792 

1-852 

1-241 


4 

1-831 

1-883 

1-249 

1924 

1 

1-826 

1-877 

1-250 


2 

1-790 

1-848 

1-243 


3 

1-772 

1-834 

1-239 


4 

1-795 

1-848 

1-249 


We construct a scale for logarithms ranging from 1*2 
to I *9 and plot the series in Diagram 27, We can* arrange 
scales of output, etc., corresponding to the appropriate 
figures on the logarithmic scale. As we are not interested 
in the logarithms of these numbers except in so far as they 
enable us to construct a diagram, the logarithmic scale 
in the diagram is eliminated, leaving only the scale showing 
the original unit involved. In this way we construct 
the diagram on a logarithmic or ratio scale without 
having the appropriately ruled paper. 

Obviously the position of one graph relative to another 
in such a diagram is a mere accident dependent on the 
units in the original table. Thus output might have 
been quoted in the table in cwts., in which case the 
characteristics of the logarithms of output would have 



Diagram 27. Coal Mining Figures on Logarithmic Scale. 


KLIMENTARY STATISTIC AL METHODS 

been 3, er output fXT man-shift might have been given in 
tons instead of cwts., in which case all the logarithms of this 
seric^ would have been negative. The resulting appearance 
of the graphs would be unchanged, because the dillerences 



between the logarithms of the series would remain the 
same, although the original series might have been 
multiplied or divided by a constant amount. This is 
equivalent to saying that points on the scale of logarithms 
shown as i*2, 1*3, 1-4, 1*5, etc., could equally be shown 


203 


GRAPHS OF TIME SERIES 

as 3‘2, 3*3, 3*4, 3*5, etc., or 1*7, 1*8, 1*9, 2*0, etc., or 
any other series obtained from the first by the addition 
to each of a constant amount. Diagram 28 shows these 
same three series as before on a logarithmic scale. The 
graphs have changed relative positions in the diagram, 
but are otherwise unaltered. 

Naturally, if we changed the scale of the logarithms 
from one graph to another the appearance of the diagram 

United Kingdom. Receipts from Super Tax 


Year 

ending 

31st March 

^000 

(logs) 

1911 

2,891 

3-46 

1912 

3,018 

3-48 

1913 

3,600 

356 

1914 

3.339 

3-52 

1915 

10,121 

401 

1916 

16,788 

4-23 

1917 

19,140 

4-28 

1918 

23,279 

4*37 

1919 

35,560 

4-55 

1920 

* 42,405 

4‘63 

1921 

55,669 

4-75 

1922 

61,351 

4-79 

1923 

63,910 

4-81 

1924 

61,747 

4-79 

1925 

62,989 

4-80 

1926 

67,833 

4-83 


would not be unaltered. If i inch is taken to correspond to 
0*1 in a scale of logarithms, then the distance of the 
point on this scale corresponding to 1*9 from that 
corresponding to 1*8 must be i inch ; it is immaterial 
where the point corresponding to i *8 is placed, on the line 
showing the scale of logarithms. If the scale is altered 
to 2 inches to o*i in a scale of logarithms, then of 
course the distance between two such points would now 


204 ELEMENTARY STATISTICAL METHODS 

be 2 inches and the appearance of any graph would be 
altered. 

A logarithmic scale is appropriately used also when we 
jEmn 



Diagram 29. Super Tax Receipts. 


wish to graph a series of figures which change fundamentally 
as time goes on, increasing or decreasing at a great rate. 
If an ordinary scale were used the smaller figures would 


205 


GRAPHS OF TIME SERIES 

hardly be distinguishable from each other in the graph, 
because the scale would be small in order to include 
the larger figures. But a logarithmic scale has the apparent 
effect of magnifying the scale in its lower part and 
contracting it in its upper regions, and on such a scale 
a series of this kind would be satisfactorily plotted. As 



Diagram 30. Super Tax Receipts (logarithmic scale). 


an illustration let us consider the figures relating to revenue 
from super tax. (Table on p, 203.) 

These figures are represented graphically on an ordinary 
scale and on a logarithmic scale in Diagrams 29 and 30 
respectively. The changes between the years 1911-14 
are hardly noticeable in 29 but are magnified in 30, the 
considerable relative change between 1914 and 1915 is 


9 

8 

7 

6 

5 

4 

3 

2 

I 

0 

9 

e 

7 

8 

•n 


iSQftd 

:res 



1920 2 ! 22 23 24 

Diagram 31. Great Britain. Acreage of Crops, 


25 


GRAPHS OF TIME SERIES 207 

given due prominence in 30 but not in 29 and the decline 
in the rate of increase in later years is brought out in 30 
whereas 29 emphasizes the actual increments each year. 

As a further illustration of a slightly different kind we 
may instance the graphing of the following figures relating 
to acreage of wheat and beet 1920-5, which were 
shown on an ordinary scale in Diagram 20. 

Great Britain. Acreage of Crops. 1920-5 



Wheat 
(000 acres) 

Beet. 

(Acres) 

1920 

1,929 

3,045 

1921 

2,041 

8,334 

1922 

2,032 

8,413 

1923 

1.799 

16,923 

1924 

1,594 

22,637 

1925 

1,548 

56,243 


These figures are shown on a logarithmic scale in 
Diagram 31, and the graphs present an entirely different 
appearance from the corresponding graphs in Diagram 20. 



Chapter 12 

ANALYSIS OF TIME SERIES 


When we consider the changes with time of a certain 
quantity we are concerned to interpret these changes, and 
to observe how they are related to similar changes which 
are apparent in other time series. For instance when we 
examine the series of figures below giving output of coal 
in Great Britain, we naturally ask ourselves to what 
the changes taking place are due, and how they are related 
to changes in other series. 


Output of Coal. Great Britain, 1873-1928 



Output 

(million 

tons) 


Output 

(million 

tons) 


Output 

(million 

tons) 


Output 

(million 

tons) 

1873 

129 

1887 

162 

1901 

219 

1915 

253 

1874 

127 

1888 

170 

1902 

227 

1916 

256 

1875 

133 

1889 

177 

1903 

230 

1917 

248 

1876 

134 

1890 

182 

1904 

232 

1918 

228 

1877 

134 

1891 

185 

1905 

236 

1919 

230 

1878 

133 

1892 

182 

1906 

251 

1920 

230 

1879 

134 

1893 

164 

1907 

268 

1921 

163 

1880 

147 

1894 

188 

1908 

262 

1922 

250 

1881 

154 

1895 

190 

1909 

264 

1923 

276 

1882 

156 

1896 

195 

1910 

264 

1924 

267 

1883 

164 

1897 

202 

1911 

272 

1925 

243 

1884 

161 

1898 

202 

1912 

260 

1926 

126 

1885 

159 

1899 

220 

1913 

287 

1927 

251 

1886 

158 

1900 

225 

1914 

266 

1928 

237 


We observe that there is, on the whole, a gradual increase 
in output of coal, that there are sudden breaks in this 

208 


209 


ANALYSIS OF TBIE SERIES 

increase of large and small amounts. We are aware that 
these figures are the results of the combined operations of 
management and workers in the Coal Mining Industry. 
We realize that they are stimulated to greater output by 
increasing demand for their product. We know that this 
demand is not constant, we know also that industrial 
troubles from time to time have the effect of stopping 
output temporarily, accidents put a mine out of operation, 
a mine ceases to be a valuable proposition at the prices 
then current, new mines are opened, technical advances 
enable coal to be won more easily, and so on. We may say 
that the figures in this series result from the operation 
of a large number of causes of different kinds, and we must 
consider the nature of these causes, in order that we 
may determine, if possible, the effect of them ; and 
so that we may relate the size of the effect produced by a 
group of causes on one phenomenon to the smnlar effect 
on another phenomenon. 

Let us first consider the kind of causes which are 
operating to produce certain effects. First, we must 
reach down to fundamental facts. At a given time there 
is a certain number of human beings in the world, a certain 
amount of habitable and cultivated land, a certain number 
of domesticated animals in the service of the human beings. 
As time progresses there is a gradual change in the land 
under cultivation, a gradual change in the number of 
animals . These changes in recent historical times have 
all been in the nature of increases, and there have been 
consequently gradual increases in demand for certain 
commodities with resultant increases in the supply of 
them. We may therefore consider that there is a certain 
growth factor which is operating to produce gradual 


210 


ELEMENTARY STATISTICAL METHODS 

changes in a certain series. The resultant gradually 
changing nature of the series is generally referred to as 
the secular trend of the series, and this trend must be 
considered as linked up to the growth factor referred 
to above. 

Secondly, operating at the same time as the growth 
factor, there is a group of causes w'hich do not operate 
continuously, but in a regular spasmodic manner. Thus, 
the seasons recur in the same way each year, day follows 
night with regularity. In certain areas of the earth there 
is a wet season followed by a dry season, certain ports are 
regularly frozen up in the winter and as regularly free in 
the summer. Seed is sown in spring and the crop is 
reaped in autumn. The result of the operation of causes 
of this kind is a regular up and down movement in a 
series of figures relating to some phenomenon, which has 
been observed, and which is affected by this group of causes. 
This movement is generally referred to as the seasonal 
movement. If the output of coal quarter by quarter in 
Great Britain were obser\'ed, an up and down movement 
of this kind w'ould be noted, due to the changing demand 
for coal in winter and summer, and this movement would 
be superimposed on the general trend already referred to. 

During the nineteenth century a similar fairly regular 
up and dowm movement has been observed in a large 
number of time series of economic data, these move- 
ments being repeated at intervals of 7-11 years. The 
causes of these periodic or cyclical changes are in doubt, 
but there is no doubt about their existence. Those 
years when the observed phenomena show upward move- 
ments are referred to as “ boom ” years and those years 
of downward movements as years of " depression ” or 


ANALYSIS OF TIME SERIES 211 

“ crisis The typical up and doun movement of this 
kind is referred to as the “ trade cycle The 

determination of the extent and re^arity of these 
movements is one of the matters we must concern ourselves 
with in our analysis of time series. 

In addition to the group of causes operating to produce 
regular up and down movements in our series, there is 
another group which operates in an adventitious manner. 
This group of causes includes such events as floods 
completely ruining a particular area, its crops and houses, 
and resulting in the deaths of many of its people ; strikes, 
resulting in a cessation for long or short periods of 
production ; deaths of monarchs which might put a stop 
to certain events which normally would occur ; fires and 
earthquakes, wars and revolutions, and so on. All these 
causes operate from time to time, there is no regularity 
in their operation, the effects may be large or small, but they 
certainly exist. In this group also would be considered the 
adventitious element in the gradual growdh factors or in 
the regularly operating spasmodic factors. Thus by a 
peculiar combination of wind, sunshine, and rain in a 
certain season there may be a bump)er crop or a pKX)r 
crop of some agricultural product. Or, an invention 
or discovery may hasten the gradual growth efiect, and 
change the nature of a certain series fundamentally. 

In any given time series, then, we look for three kinds 
of movement : — 

(1) General trend. 

(2) Regular fluctuations 
{a) Seasonal, 

{h) Cyclical. 

(3) Irregular fluctuations. 


212 ELEMENTARY STATISTICAL METHODS 

We attempt to analyse a series into these three 
constituent parts, \^^len considering the relationship 
between one series and another we attempt to relate each 
corresponding part of the two series. 

A given series may be composed of all three kinds of 
movement, or it may not fluctuate at all. There may be 
a general trend upwards or downwards, or the general 
size of the figures in the series may be the same as time 
goes on. We will consider the general case where all three 
elements are supposed to enter. 

Our analytical problem can be illustrated by considering 
how a given series may be made up. Let us suppose 
we are dealing with an annual series in which the seasonal 
movement (if any) vsill be concealed, but which is made 
up of a general trend, cyclical fluctuations and irregular 
fluctuations. Let us suppose that the series is constructed 
as in the table on p. 213. 

Here we have a series in column 5 w'hich has been made 
up as shown in the table. The problem we have to consider 
in practice is, being given only such a series as that in 
column 5, can we recontruct columns 2, 3, and 4 ? We 
realize that, w’hatever solution is obtained as a result of 
the analysis, this solution will necessarily be appro.ximate 
only, but such an approximate solution may be sufficient 
for practical purposes in particular cases. 

The simplest method of analysis is, first to eliminate 
entirely, as far as is possible, all the fluctuations from the 
series, whether of a regular nature or an irregular nature, 
leaving us only with the general trend. Having this, w'e 
can now obtain the total fluctuation in the series for 
each year, since for any given year the value of the series 
is equivalent to the trend value plus the fluctuation. The 


213 


ANALYSIS OF TIME SERIES 

fluctuations can then be analysed separately in an 
endeavour to find the extent of the regular part of the 
fluctuations. When this is known simple subtraction 
gives us the irregular part of the fluctuations. The 
procedure outlined above will be considered in more 
detail now. 


(1) 

Year 

(2) 

General 

Trend 

(3) ! (4) 1 

Cyclical | Irregular | 
Fluctuations i Fluctuations | 

(5) 

Resulting 

Series 

1 1 

10-0 

+ 1-5 

- 0*4 1 

11-1 

2 1 

10*1 

+ i-o 

-h 2-0 1 

13*1 

3 1 

10*2 

0 

-1*9 1 

8-3 

4 

10*3 

- 1*0 

+ 0*7 

10-0 

5 

10-4 

— 1*5 

+ 1-2 

101 

6 

10*5 

^ 1*0 

^ 0*3 

9*2 

7 

10*6 

0 

-h 0-8 

11-4 

8 

10*7 

+ 10 

— 0-2 

1 11-5 

9 

10-8 

+ 1*5 

- 0*6 

11*7 

10 

10*9 

+ 1-0 

+ 0-4 

12*3 

11 

11*0 

0 

0 

11*0 

12 

11*1 

- 1*0 

- 0*7 

9*4 

13 

11*2 

- 1*5 

-h 0*3 

10*0 

14 

11*3 

- 10 

0 

10*3 

15 

11*4 ! 

0 

+ 1*6 

13-0 

16 

11*5 

1*0 

— 1-1 

1 11*4 

17 I 

1 11*^ i 

-f 1*5 

- 0*8 

! 12-3 

18 1 

1 11-7 1 

-f 1*0 

+ 1*5 

14-2 

19 ! 

! 11*8 ! 

0 

+ 0*8 

12-6 

20 ! 

11*9 1 

- 1*0 

- 0*8 

10-1 

21 I 

12*0 1 

- 1*5 

+ 1-9 

1 12-4 

22 1 

12-1 1 

- 1*0 

- 0*4 

j 10*7 

23 1 

1 

12*2 

0 

-f 0*7 

1 12*9 


The elimination of the fluctuations can be done by 
making a graphical, representation of the given series, 
observing the serrated nature of the graph, and smoothing 
out the irregularities by drawing a freehand curve which 
appears satisfactorily to describe the general trend of the 
graph. This method has the advantage of quickness of 



thus two persons might arrive at different conclusions as 
to the nature of the trend values, because they have drawn 
two different smoothed curves. On the other hand, 
the fact that each of them has made his best endeavour 
to arrive at the best result, and these differing, 
emphasizes the non-precise nature of the conclusion. 
As we p>ointed out, any result which is obtained is 
necessarily appro.ximate. The table above has been 



graphed in Diagram 32 and the trend has been drawn 
in what appear at first sight to be the correct position. 
The value on the trend in the first year is 9-6 and in the 
last year 12-6, these figures should be contrasted with 
10 o and 12 -2 of the table. It is likely that others trying 
to estimate the trend from the graph would arrive at 
slightly different conclusions from those shown in 
Diagram 32. 

A more usual method of elimination of the fluctuations 
is by using moving av'erages. We will proceed to disclose 
this method, and what it involves. 

We are given a series of values of some quantity at 


215 


ANALYSIS OF TIME SERIES 

equal time intervals. Groups of n successive values of 
the series are averaged, these groups being composed as 
follows, the first group consists of the first n items of the 
series, the second group consists of the items from the 
2nd to the {n + i)th, the third group consists of the 
items from the third to the (« + 2)th, and so on. These 
averages are supposed to give the trend values corre- 
sponding on the time scale to the middle period between 
the first and nth time intervals, the middle period between 
the second and (n -{- i)th time intervals, and so on. The 
choice of the n values to be averaged is in the hands of 
the operator and is arbitrary, though in practice he would 
be guided by certain considerations to be discussed later. 
Suppose for example we considered the made-up series in 
the table above, and calculated 5-yearly moving averages. 
The average of the first five items in the series is 10*3 
and represents the trend value for the mid-year of the 
first five, i.e. year 3. 


Year 

Series 

Sums 

Averages 

1 

IM 



0 

13- 1 



3 

8-3 

51-7 

10-3 

4 

10-0 

52-0 

10-4 

5 

9-2 

50-4 

10-1 

6 

11-4 

53-8 

10-8 

7 

11-5 



8 

n -7 




So, the average of the five items from the second to the 
sixth is 10*4 and gives the trend value for the year 4, 
half-way between the second and sixth years, and so on. 

These averages, which are supposed to give the trend 
values, are called Moving Averages. 


2I6 ELEMENTARY STATISTICAL METHODS 

We must naturally consider how far this method does 
in fact perform the function expected of it. It is supposed 
to operate on a given series and eliminate the fluctuations, 
and give only the general trend. Therefor®, if this process 
is used on a series which does not contain fluctuations 
at all, i.e. a trend series only, it should reproduce the 
original series exactly. Also, if this process is used on 
a series consisting only of fluctuations these should 
be entirely eliminated, leaving us with a series of zeros. 
Let us first of all consider a series of figures without 
fluctuations which, when plotted, would be graphically 
represented by means of a straight line. 


Time 


Sums 

5-Interval 

Sums 

7-Interval 

Sums 

8-Interval 

Inter* 

Series 

in 

Sfoving 

in 

Moving 

in 

Moving 

val 


5*s 

Average 

Ts 

Average 

S-s 

Average 

1 

1 







2 

3 







3 

5 

25 

5 





4 

7 

35 

7 

49 

7 

64 

80 

96 

112 

128 

144 

160 

176 

8 

10 

12 

14 

16 

18 

20 

22 

5 

9 

45 

9 

63 

9 

6 

11 

55 

11 

77 

11 

7 

13 

65 

13 

91 

13 

8 

15 

75 

15 

105 

15 

9 

17 

85 

17 

119 

17 

10 

19 

95 

19 

133 

19 

11 

21 

105 

21 

147 

21 

12 

23 

115 

23 

161 

23 

13 

25 

125 

25 





14 

27 







15 

29 








If these are averaged, however many are grouped, the 
resulting averages are exactly the same as the original 
figures, when an odd number of items are taken ; or if 
an even number are averaged the results are trend values 
corresponding to positions on the time scale half-way 
between the given time intervals. Tbus the 8-interval 


217 


ANALYSIS OF TIME SERIES 
moving averages give trend values for the time intervals 
4-5. 5*5. 6 - 5 , etc. 

This simple illustration is merely an example of a general 
principle If the moving average process operates on 
a series of values, which when plotted are shown by a 
straight line, then the result of the process is to reproduce 
the original series, or another series which, when plotted, 
give points on the original line. This is true whether the 
line shows increasing, decreasing, or stationary values in 
the series. A straight line trend is reproduced exactly 
by this process. 

On the other hand, a curved trend is not exactly 
reproduced by the process. Let us illustrate with a series 
obtained by giving x successive values i, 2, 3, ... in 
the expression i + The table below shows the 

working of the 5-, 6-, 7-interval moving averages process. 


Time 


Sums 

5-Interval 

Sums 

6-Inter\"al 

Sams 

7-Interval 

Inter- 

Series 

in 

Moving 

in 

Moving 

in 

Moving 

val 


5's 

Average 

6's 

Average 

7's 

Average 

1 

2 





t 


2 

4 







3 

4 

5 

6 

7 

11 

16 

22 

40 

60 

85 

115 

$ 

12 

17 

23 

62 

89 

122 

161 

206 

10-3 

14-8 

20-3 

26-8 

34-3 

91 

126. 

168 

13 

18 

24 

7 

8 

29 

37 

150 

190 

30 

38 

217 

31 

9 

46 







10 

56 








In the case of the 5* and 7-interval moving averages 
these definitely are not the same as the original series. 
The 6-interval moving averages also are not the same as 
the values of the original expression obtained when x is 
given values 3^, 4^, 5^, 6|. 7^ these being 8-9, 13-4, 


2i8 elementary statistical methods 

i8*9, 25-4, 32-9. The process gives figures which are 
greater than those in the original series. 

Let us now consider a series obtained by giving x 
successive values i, 2, 3, ... in the expression 16 + 3|x 
— \x^, and operate on the series with 5- and 7-interval 
moving averages. 


Time 

Inter- 

val 

Series 

Sums 

in 

5's 

5-Interval 

Moving 

Average 

Sums 

in 

7's 

7-Interval 

Moving 

Average 

1 

19 





2 

21 





3 

22 

105 

21 



4 

22 

105 

21 

140 

20 

5 

21 

100 

20 

133 

19 

6 

19 

90 

18 

119 

17 

7 

16 

75 

15 

98 

14 

8 

12 

55 

11 



9 

7 





10 

1 






Again, the process does not reproduce the original series, 
here all the results are less than the given figures. These 
two examples are illustrations of the working of a general 
principle : — ^when the moving average process operates 
on a trend which is graphically represented by a curve, 
the result is another curve different from the original. If 
the trend curve is convex to the horizontal time axis, the 
moving average curve is above the original ; if the trend 
curve is concave to the time axis the moving average curve 
is below the original. More generally, the process tends to 
reduce the curvature in the original graph. The moving 
average curve will be closer to the original curve in that 
part of it where the curvature is least, and where a curve 
is very nearly equivalent to a straight line, the moving 
average curve will be practically coincident with it. 


ANALYSIS OF TIME SERIES 219 

Let US take another illustration where the original curve 
has varying curvature in different parts. Diagram 33 shows 


Time 

Interval 

Series 

5-Interval 

Moving 

Average 

1 

37 


2 

59 


3 

75 

70 

4 

86 

82 

5 

93 

90 

6 

97 

95 

7 

99 

98 

8 

100 

100 

9 

101 

102 

10 

103 

105 

11 

107 

no 

12 

114 

118 

13 

125 

130 

14 

141 


15 

163 



the original series together with the result of the 5-interval 
moving average process operating on the series. Where 
the original series is concave to the time axis, the moving 
average curve is below, and where the original is convex, 
the moving average is above it. Also, the process gives 
results which are closest to the original where the curvature 
is not so marked. The difference between the two is five 
when the time interval is 3 and 13, and is one when the 
time interval is 7 and 9. 

Thus the process introduces distortion when the original 
series is curved. This is the great disadvantage of this 
method. On the other hand the amount of distortion is 
not great in those cases where there is not a large amount 
of curvature in the original. 

Let us now consider the effect of using the method on 
a series of regular fluctuations. 


220 ELEMENTARY STATISTICAL METHODS 


Time 

Inter- 

val 

Series 

Sums 

in 

5 *s 

5 -Interval 

Mo\’ing 

Average 

Sums 

in 

7 *s 

7 -Interval 

Moving 

Average 

Sums 

in 

9 *s 

9 -Interval 

Moving 

Average 

1 

+ 2 







2 

+ 2 







3 

0 

+ 1 

+ -2 





4 

- 1 

- 3 

+ -6 

0 

0 



5 

- 2 

- 4 

- -8 

0 

0 

+ 4 

+ -4 

6 

- 2 

- 2 

- -4 

0 

0 

+ 2 

+ -2 

7 

-1- 1 

+ 1 

+ -2 

0 

0 

- 1 

- 1 

8 

+ 2 

+ 3 

-|- *6 

0 

0 

- 3 

- -3 

9 

+ 2 

+ 4 

+ -8 

0 

0 

- 4 

_ .4 

IQ 

0 

+ 1 

+ -2 

0 

0 

- 1 

- 1 

11 

- 1 

- 3 

- -6 

0 

0 

+ 3 

+ '3 

12 

- 2 

_ 4 

— '8 

0 

0 

+ 4 

+ *4 

13 

- 2 

- 2 

- *4 

0 

0 

+ 2 

+ 2 

14 

-t- 1 

-i- 1 

+ -2 

0 

0 

- 1 

- 1 

15 

+ 2 

+ 3 

-h -6 

0 

0 

- 3 

- -3 

16 

+ 2 

-f -4 

-h -8 

0 

0 

-4 

— *4 

17 

0 

+ 1 

+ '2 

0 

0 



18 

- 1 

- 3 

— '6 





19 

I -2 







20 

i ~ ^ 








Diagram 34 shows the original series with the results 
of using 5-, 7-, and 9-moving avenges. The original 
series consists of regular fluctuations repeating themselves 
at the end of 7-time intervals. The moving averages in 
groups of 7 absolutely eliminates the fluctuations. The 
5- and 9-moving averages leave us with series which still 
contain fluctuations, although the extent of these has been 
much reduced. In the original series the fluctuations 
range between ± 2, the reduced fluctuations in the series 
of averages of 5 items range between ±'S, and those in 
the series of averages of 9 items between ± *4. Further, 
the series of averages of 9 items have maxima when the 
original series had minima and vice-versa j we may say 
that the process of smoothing the fluctuations in the 
original series has gone too far, the fluctuations have not 


ANALYSIS OF TIME SERIES 


22t 



series. 

This e.xample illustrates certain general principles. If 


Time 

Inter- 

val 

Series 

Sums 

in 

5*s 

5-Interval 

Moving 

Average 

Sums 

in 

9's 

9-Inter\"al 

Moving 

Average 

Sums 

in 

15's 

15-Interval 

Mo\dng 

Average 

1 1 

- 2 







2 1 

0 







3 1 

+ 1 

— 1 

— -2 





4 i 

0 

— 2 

— -4 





5 1 

0 

— T 

— -2 

- 5 

— -6 



6 

- 1 

— 1 

— -2 

- 4 

— *4 



7 

— 1 

— 4 

- -8 

— 5 

— -6 



8 

-h 1 

- 5 

— 10 

— 5 

- -6 

- 4 

- -3 

9 

- 3 

— 5 

- 10 

- 2 

— -2 

- 3 

— -2 

10 

— 1 

- 3 

- -6 

— 3 

— -3 

— 3 

— -2 

11 

— 1 

— 1 

— -2 

— 2 

— -2 

- 3 

- -2 

12 

-f 1 

-f 1 

+ *2 

— 2 

— -2 

- 1 

- -1 

13 

-h 3 

+ 2 

+ -4 

— 3 

— -3 

- 2 

- -1 

14 

- 1 

+ 2 

+ -4 

+ 1 

+ -1 

— 1 

— -1 

15 

0 

-f 1 

+ -2 

+ 4 

+ -4 

+ 2 

+ *1 

16 

- 1 

— 1 

— -2 

+ 4 

+ -4 

+ 2 

+ -1 

17 

0 

-h 2 

4- -4 

+ 3 

+ -3 

+ 5 

+ -3 

18 

-h 1 

+ 1 

-h -2 

+ 2 

+ -2 

+ 8 

+ -5 

19 

+ 2 

-f 2 

4- -4 

+ 4 

+ -4 

+ 7 

+ "5 

20 

— 1 

-f 4 

4- -8 

+ 4 

+ -4 

+ 3 

+ -2 

21 

0 

4- 4 

4- -8 

+ 7 

+ -8 

+ 2 

+ 1 

22 

-f 2 

+ 2 

-h *4 

+ 5 

+ -6 

+ 4 

+ *3 

23 

-h 1 

H- 5 

+ 10 

+ 1 

+ 1 

+ 2 

+ *1 

24 

0 

4- 3 

4- *6 

+ 1 

+ 1 



25 

4- 2 

- 2 

— -4 

+ 3 

+ *3 



26 

_ 2 

- 1 

- 2 

+ 1 

+ -1 



27 

- 3 

0 

0 





28 

-h 2 

- 4 

- -8 





29 

-f 1 







30 

- 2 








the regular fluctuations exactly repeat themselves at the 
end of n time intervals, then an n-time interval moving 
average will eliminate the fluctuations entirely. If the 
process of moving averages is used, taking groups of less 




224 ELEMENTARY STATISTICAL METHODS 

than n items, the fluctuations are merely reduced but not 
eliminated, if more than n items but less than 2n are taken 
in the averaging process the fluctuations are still further 
reduced, but now the result gives a series with maxima 
when the original series had minima and vice-versa. 
If 2 n items are taken in the averaging process the 
fluctuations are again eliminated, and so on. Thus, to 
ehminate entirely the fluctuations, we must use the method 
of mo\ing averages with n, 2 n, 3«, etc., items. 

Let us now consider the effect of using the method of 
mo\ing averages on a series of irregular fluctuations. We 
can illustrate with the series above (table on p. 222), on 
which the moving average process has been used, taking 
different numbers of items together. 

The results of using the process on this series are 
apparent. The size of the fluctuations is reduced, but the 
fluctuations are not eliminated. The original fluctuations 
range betvs'een 4 - 3, the 5-, 9-, 15-interval moving averages 
range between ±1. + *8 and — *6, -|- *5 and — -3 
respectively. Thus with a larger number of items in the 
average the extent of the fluctuations is greatly reduced. 
This illustrates a general principle. With adventitious 
fluctuations, positive and negative signs occurring at 
random, the more items taken together the more chance 
there is of the positive and negative amounts occurring 
in equal numbers in any group, and, therefore, cancelling 
to a large extent. But we should never expect alw^ays to 
get complete cancellation, the successive sums would be 
generally different from zero, but small. The larger the 
number of items in the group the larger the denominator 
in the average and, therefore, the smaller the average. 

Thus the most complete reduction in the extent of the 



226 


ELEMENTARY STATISTICAL METHODS 


fluctuations will be obtained when we take as large a 
number as possible, of items, in our smoothing process. 

We may summarize now the results so far obtained of 
using the moving average method. 


Trend. Linear, 

Cur\"ed. 


Fluctuations. Regular. 


Fluctuations. Irregular. 


Reproduces the trend exactly. 

Reduces the curv’ature. the more items we 
take in the moving average, the more 
remote is the result from the original. 

That number of items in the average which 
agrees with the period of the fluctuations 
eliminates them entirely, or any multiple 
of that number. Other groupings of items 
merely reduce the extent of the 
fluctuations. 

These are never completely eliminated, but 
they are considerably reduced, and the 
greater the number of items in the average 
&ie more is the reduction in the fluctuations. 


When we wish to combine these results into a working 
rule for guidance in using the method on a future occasion, 
we find ourselves on the horns of a dilemma. If we use too 
many items in our averages we shall do well, as far as the 
irregular fluctuations are concerned, but we may distort 
the trend and may not properly eliminate the regular 
fluctuations. If w'e use too few, we may not reduce 
sufficiently the irregular fluctuations, we may not get rid 
of the regular fluctuations, but we probably will not 
introduce distortion into the trend. In practice, we adopt 
a middle course, and take as the number of items in the 
averaging process that which wdll eliminate the regular 
fluctuations, hoping that we shall thus reduce very consider- 
ably the random fluctuations without introducing too 
much distortion into the trend. Thus, in practice, when 
we wish to use the moving average method we search for 
periodicity in our series. If, from a diagram, we estimate 
that an annual series (say) appears to have a regular up 


ANALYSIS OF TIME SERIES 


227 


and down movement repeated at intervals of seven years, 
we use a 7-year moving average in order to smooth 
out the fluctuations, and hope that the result will give us 
a very good approximation to the trend. 

Let us consider the series which was graphically 
represented in Diagram 32. As far as we can judge from 


Year 

Series 

5-year 

Movisg 

Average 

8-year 

Moving 

Average 

9-year 

Moving 

Average 

1 

11*1 




2 

13-1 




3 

8-3 

10-5 



4 

10-0 

10-1 

10-6 


5 

10-1 

9-8 

10*7 

6 

9-2 

10-4 

lU* / 

10*8 

7 

11-4 

10-8 

10*O 

WA A 

10*6 

8 

11-5 

11-2 

10-9 

9 A 0 

10*7 

9 

11-7 

11-6 

lO'o 

V A 0 

10*7 

10 

12-3 

11-2 

10*8 

9 A A 

10*8 

11 

11-0 

10-9 

10*9 

11*2 

12 

9-4 

10-6 

11-1 

99 9 

11*2 

13 

10-0 

10-7 

11*1 

11*3 

14 

10-3 

10-8 

11*2 

11*5 

15 

13-0 

11-4 

11*4- 

4 4 A 

11*6 

16 

11-4 

12-2 

11*6 

4 4 94 

11*5 

17 

12-3 

12-7 

11-7 

9 A A 

11*8 

18 

14-2 

-12-1 

12*0 

11*9 

19 

12-6 

12-3 

12*1 

12*2 

20 

10-1 

12-0 

12* 1 

^1 

12-4 

11-7 



22 

10-7 




. 23 

12-9 





this graph, there are prominent peaks in years 2, 10, 18, 
at intervafe of 8 years, and prominent depressions in 
years 3, 12. 20, at intervals of 9 and 8 years. We can 
reduce the irregular fluctuations by using (say) a 5-year 
moving average. This would also have the effect of showing 
11s if the regular movement was repeated at 8- or 9-year 


228 


ELEMENTARY STATISTICAL METHODS 


intervals. This is done and the results in Diagram 36 
show peaks at 9 and 17 years and depressions at 5 and 12 
or 13 years. Let us then use an 8-year and a 9-year moving 
average on the series. The results are shown in the 
table on p. 227. 

Diagram 36 also shows the results of the 8-year moving 
average smoothing. The fluctuations are nearly completely 
eliminated. It will be observed that the points on the 
moving average graph are placed in between ordinates 
drawn through points on the time axis corresponding to 
the given years. If we wished to estimate the values on 
the smoothed graph (from the 8-year moving average), 
corresponding to the original values in the series, we 
proceed as follows : — 




230 


ELEMENTARY STATISTICAL METHODS 


The figures in the column headed " Divide by i6 ” are 
the values on the 8-year smooth curve for the years for 
which the original figures in the series were given. These 
smooth values are really the averages of the figures obtained 
as a result of the 8-year moving average process. It is 



interesting to compare the results with the original trend 
figures from which our series was obtained. These last 
are shown in the table on p. 228 side by side with the smooth 
series . They differ at most by • 4 ; the difference is, of course, 
due to the fact that the irregular fluctuations are not 
completely eliminated. 


ANAJ.YSIS OF TIME SERIES 


231 


One disadvantage in the use of this method is that 
trend values corresponding to a first and last group ol 


Year 

Index 

Number 

7 -year 
Moving 
Average 

9-year 

Mo\ing 

Average 

! Year 

1 

1 

j 

j Index 
j Number 

; 7-year 
: Moving 
Average 

1 9-year 
■ Moving 
i Average 

1846 

89 



j 1880 

i 88 

j 86-1 

86-0 

1847 

95 



1 1881 

i 85 

' 83-6 

83 '4 

1848 

78 



; 1882 

! 84 

i 81-4 

’ 80-7 

1849 

74 

80-9 


; 1883 

; 82 

i 79-4 

i 78-6 

1850 

77 

81-7 

84-8 

i 1884 

! 76 

! 76-6 

, 77-1 

1851 

75 

82-7 

86-1 

1 1885 

1 72 

; 74-4 

75-3 

1852 

78 

860 

86-8 

1 1886 

! 69 

' 72-7 

73-9 

1853 

95 

89-9 

89-8 

i 1887 

68 

, 71-3 

; 72-6 

1854 

102 

93-9 

91-7 

1 1888 

i 70 

70-7 

1 71-0 

1855 

101 

961 

93-6 

j 1889 

72 

; 70-1 

70 • 1 

1856 

101 

98-4 

99-1 

1890 

72 

' 70-0 

69- 1 

1857 

105 

990 

98-4 

1 1891 

1 72 

1 69-3 

i 68-3 

1858 

91 

98-4 

99-2 

! 1892 

i 68 

: 68-1 

67-6 

1859 

94 

98-4 

99-2 

1893 

68 

66-6 ! 

: 66-7 

1860 j 

i 99 

98-7 

99-7 

1894 

63 

1 65-1 ; 

65-8 

1861 

! 98 

98-7 

99-7 

1895 

62 

i 64-0 ; 

; 65-3 

1862 1 

101 

100- 1 

99-3 

1896 

61 

j 64-0 i 

, 65-7 

1863 1 

103 

101-3 

100-3 : 

; 1897 

62 

1 65*0 

65-9 

1864 1 

105 

101-4 j 

100-9 1 

[ 1898 

64 

' 66 0 : 

660 

1865 

101 

101-6 

100-8 i 

1899 

1 68 

: 67 0 ; 

66-7 

1866 

102 

101-1 1 

100-6 

1900 

75 

68- 1 

67-6 

1867 

100 

100-1 

100-4 , 

1901 

70 1 

i 69-3 

68-8 

1868 

99 

99-4 

101-1 

1902 

69 I 

1 70-4 ; 

70-4 

1869 

98 

100-6 

101-8 

1903 

69 ; 

71-7 ; 

72-2 

1870 

96 

101-9 

101-9 

1904 

70 ; 

72-4 ' 

72-8 

1871 

100 

102-1 

101-2 

1805 ! 

72 : 

72-9 1 

72-7 

1872 

109 

101-7 

100-7 

1906 

77 i 

73-6 ; 

73-6 

1873 

111 

101-3 

100-1 

1907 

80 

74-9 ! 

74-8 

1874 

102 

101-0 

98-9 

1908 

73 

76-3 ! 

76-6 

1875 

96 

99-1 

97-4 

1909 

74 ' 

78-1 ; 

78-2. 

1876 

95 

95-4 

96-1 

1910 

78 ; 

79-3 1 

t 


1877 

94 

92-1 

93-4 

1901 

80 1 


1878 1 

87 

89-7 

90-4 

1912 

85 ! 

i 


1879 ! 

i 

83 

88-0 

88-2 

1913 

85 1 

1 

1 



values of the series are not obtained. On the other hand, 
it is a method which is easy of application, and produces 
results which are the same whoever is operating. This 


232 ELEMENTARY STATISTICAL METHODS 

last is a distinct advantage over the freehand curve method, 
which has the merit of being simpler. 

W^en we are dealing with certain series of economic 


Year 

Series 

Trend 

Flnctua- 

tion 

Year 

Series 

Trend 

Flnctoa* 

tion 

1846 1 

89 




1880 

88 

86 

+ 2 

1847 

95 




1881 

85 

83 

+ 2 

1848 

78 




1882 

84 

81 

+ 3 

1849 

74 




1883 

82 

79 

+ 3 

1850 

77 

85 

— 

8 

1884 

76 

77 

- 1 

1851 

75 

86 

- 11 

1885 

72 

75 

-3 

1852 

78 

87 

— 

9 

1886 

69 

74 

-5 

1853 

95 

90 

+ 

5 

1887 

68 

73 

-5 

1854 

102 

92 

+ 10 

1888 

70 

71 

- 1 

1855 

101 

94 

+ 

7 

1889 

72 

70 

+ 2 

1856 

101 

96 

+ 

5 

1890 

72 

69 

+ 3 

1857 

105 

98 

+ 

7 

1891 

72 

68 

+ * 

1858 

91 

99 


8 

1892 

68 

68 

0 

1859 

94 

99 

— 

5 

1893 

68 

67 

+ 1 

1860 

99 

100 

— 

1 

1894 

63 

66 

-3 

1861 

98 

100 

— 

2 

1895 

62 

65 

~ 3 

1862 

101 

100 

+ 

1 

1896 

61 

66 

— 5 

1863 

103 

100 

+ 

3 

1897 

62 

66 

-4 

1864 

105 

101 

+ 

4 

1898 

64 

66 

- 2 

1865 

101 

101 

0 


1899 

68 

67 

+ 1 

1866 

102 

101 

+ 

1 

1900 

75 

68 

+ 7 

1867 

100 

101 

— 

1 

1801 

70 

69 

+ 1 

1868 

99 

101 

— 

2 

1902 

69 

70 

— 1 

1869 

98 

102 

JU 

4 

1903 

69 

72 

— 3 

1870 

96 

102 

— 

6 

1904 

70 

72 

— 2 

1871 

100 

101 

— 

1 

1905 

72 

73 

— 1 

1872 

109 

101 

+ 

8 

1906 

77 

74 

+ 3 

1873 

111 

100 

+ 

11 

1907 

80 

75 

+ 5 

1874 

102 

. 99 

+ 

3 

1908 

73 

76 

— 3 

1875 

96 

97 


1 

1909 

74 

78 

— 4 

1876 

95 

96 

— 

1 

1910 

78 

79 

— 1 

1877 

94 

93 

+ 

1 

1911 

80 



1878 

87 

90 

— 

3 

1912 

85 



1879 

83 

88 


5 

1913 

85 

i 



data, although we may see that fairly regular fluctuations 
exist, we may have some doubt as to the exact periodicity 
of the movement ; thus, if we consider a graph of the 


233 


analysis of time series 

Statist Wholesale Price Index from 1846-1913, Diagram 37, 
we can see an up and down movement repeated more or less 
every seven years. Thus there are peaks in 1900 and 1907, 
in 1873 and 1880, in 1857 ^^64. But the inter\^al 

between 1864 and 1873 is nine years. There are depressions 
in 1858 and 1870 (12 years’ interval) and 1879 (9 year’s 
interval) and 1887 (8 year’s interval) and 1896 (9 year’s 
interval). The movement is not perfectly regular, and if 
we use a 9-year moving average it will give satisfactory 
results in some parts of the range, but in others a 7-year 
moving average might be preferable. Let us consider 
the results of operating both methods in the series (table 
on p. 231), 

The results of using the moving average process are 
shown in Diagram 37. There are still apparent some slight 
perturbations in the 7-year graph, and up to 1898 it is 
probably best to take the results of the 9-year smoothing 
as the trend, and after that the 7-year smoothing. We may 
even smooth out any slight irregularity remaining in the 
results of using moving averages, e.g. in 1862 and 1867, 
and obtain the series on p. 232 as the trend. The table 
shows this together with the original series and the 
fluctuations from the trend. 

Let us now consider the problem of obtaining the extent 
of the regular movement, when it exists, in a series. We 
can illustrate this by dealing with a series possessing 
a seasonal movement. Such a series is that below, giving 
output of coal in Great Britain, quarterly, which v/e w'ould 
anticipate will exhibit a seasonal movement on account of 
the changing demand for coal with the seasons. We operate 
on the series with a 4-quarter moving average, in order to 
obtain the trend. Knowing this w^e can get the fluctuations 


234 


ELEMENTARY STATISTICAL METHODS 


from it, and anal5^e these in order to arrive at the seasonal 
movement. 


Great Britain : Quarterly Output of Coal (mn. tons) 





Sums 

Add 

Divide 

Fluctua* 


Quarters 

Output 

in 

in 

by 8. 

tions from 




4’s 

pairs 

(Trend) 

the Trend 

1927 

1 

68-3 






2 

3 

62-6 

61*1 

255*3 

252*4 

247*7 

243*0 

241*2 

243*9 

248*7 

255*1 

260*6 

262*6 

259*0 

252*5 

247*1 

236*5 

232-2 

227*0 

223*4 

507*7 

63*5 

- 2*4 

1928 

4 

1 

2 

63*3 

65*4 

57*9 

500*1 

490*7 

484*2 

62*5 

61*3 

60*5 

+ 0*8 
+ 4*1 
- 2*6 


3 

56*4 

485*1 

60*6 

-4*2 


4 

61*5 

492*6 

61*6 

-0*1 

1929 

1 

2 

68*1 

62*7 

503*8 

515*7 

63*0 

64*5 

-|- 5* 1 
- 1*8 


3 

62*8 

523*2 

65*4 

- 2*6 

1930 

4 

1 

670 

70*1 

521*6 

511*5 

65*2 

63*9 

+ 1*8 
+ 6*2 


2 

i 3 

59*1 

56*3 

499*6 

483*6 

62*4 

60*4 

-3*3 

-4*1 


4 

61*6 

468*7 

58*6 

+ 3*0 

1931 

1 

2 

3 

59*5 

54*8 

51*1 

459*2 

450*4 

57*4 

56*3 

+ 2*1 
- 1*5 


4 

58-0 






This series of fluctuations consists of both regular and 
irregular perturbations, the periodicity of the regular 
movement being four quarters. We arrange the fluctuations 


as follows : — 

Fluctuations from the Trend 


Quarters 

1 

2 

3 

4 

1927 



- 2*4 

+ 0*8 

1928 

+ 4*1 

- 2*6 

- 4*2 

- 0*1 

1929 

+ 5*1 

- 1*8 

- 2*6 

+ 1*8 

1930 

+ 6*2 

- 3*3 

- 4*1 

+ 3*0 

1931 

+ 2*1 

- 1*5 



Totals 

i + 17*5 

- 9*2 

- 13*3 

+ 5*5 

Averages 

1 + 4-4 

- 2*3 

- 3*3 

+ 1*4 


The last line gives the regular seasonal movement. 


ANALYSIS OF TIME SERIES 


235 


Each first quarter’s figure is the result of the regular 
influences plus other adventitious influences 
which do not operate in a regular periodic fashion. We 
with respect to the latter, that over a long time 
their influence will sometimes be to increase the normal 
movement and sometimes to diminish it, and the 
lesults of these factors will more or less cancel when we 



Quarter 

Trend 

Seasonal 

Move- 

ment 

Irregular 

Fluctua- 

tions 

: Original 
Series 

1927 

3 

63-5 

- 3-3 

4 - 0-9 

61-1 


4 

62-5 

-1- 1-4 

- 0-6 

63-3 

1928 

1 

61*3 

-1- 4-4 

- 0-3 

65-4 


2 

60-5 

- 2*3 

- 0-3 

57-9 


3 

. 60-6 

- 3-3 

- 0-9 

56-4 


4 

61-6 

+ 1-4 

- 1-5 

61-5 

1929 

1 

630 

-f- 4-4 

4- 0-7 

68-1 


2 

64-5 

- 2-3 

4-0-5 

62-7 


3 

65-4 

- 3*3 

4-0-7 

62-8 


4 

65-2 

-h 1-4 

4- 0-4 

67-0 

1930 

1 

63-9 

-1- 4-4 

4- 1-8 

70-1 


2 

62-4 

- 2-3 

- 1-0 

59-1 


3 

60-4 

- 3-3 

- 0-8 

56-3 


4 

58-6 

-1- 1*4 

4- 1-6 

61-6 

1931 

1 

57-4 

4-4-4 

- 2-3 

59-5 


2 

56*3 

- 2-3 

4- 0-8 

54-8 


add them together. Naturally the more years’ figures 
we can take, the more likely is this canc^ation to be 
complete and, at any rate, the average of this random part 
of the fluctuations is likely to be very close to zero. 
So we assume that the average of the first quarter’s figures 
over a period of years represents only the regular move- 
ment, that, by averaging, the random fluctuations have 


ELEMENTARY STATISTICAL METHODS 


236 

been eliminated. Similarly we do likewise with the other 
quarters’ figures, and assume that the result gives only 
the regular movement. In this kind of calculation 

Monthly Cost of Living Index Numbers of the Ministry of Labour 


Months 

Index 

12- 

month ■ 
M.A. 
Trend 

Flnctna- 

tions 

Months 

Index 

12- 

month 

M.A. 

Trend 

Huctua- 

tiODS 

1927, 

1 

175 



1930, 

1 

166 

161-7 

+ 4-3 


2 

172 




2 

164 

161-2 

+ 2-8 


3 

171 




3 

161 

160-7 

+ 0-3 


4 

165 




4 

157 

160-0 

- 3-0 


5 

164 




5 

155 

159-3 

-4-3 


6 

163 




6 

154 

158-3 

- 4-3 


7 

166 

167-2 

- 1-2 


7 

155 

157-3 

-2-3 


8 

164 

166-7 

- 2-7 


8 

157 

156-2 

+ 0-8 


9 

163 

166-1 

- 1-1 


9 

157 

155-3 

+ 1-7 


10 

167 

165-8 

+ 1*2 


10 

156 

154-4 

+ 1-6 


11 

169 

165-8 

+ 3-2 


11 

157 

153-7 

+ 3-3 


12 

169 

165-8 

+ 3-2 


12 

155 

153-0 

+ 2-0 

1928, 

1 

168 

165-9 

+ 2-1 

1931, 

1 

153 

152-2 

+ 0-8 


0 

3 

166 

164 

165-9 

165-9 

+ 0-1 
- 1-9 


2 

3 

152 

150 

151 -4 
150-4 

+ 0-6 
-0-4 


4 

164 

165-9 

- 1-9 


4 

147 

149-5 

— 2-5 


5 

1S4 

165-8 

- 1-8 


5 

147 

148-5 

— 1-5 


6 

165 

165-6 

- 0-6 


6 

145 

147-8 

— 2-8 


7 

165 

165-5 

-0-5 


7 

147 

147-2 

— 0-2 


8 

165 

165 5 

- 0-5 


8 

145 

146-8 

— 1-8 


9 

165 

165-5 

-0-5 


9 

145 

146-4 

— 1-4 


10 

166 

165-5 

+ 0-5 


10 

145 

146-1 

— 1-1 


11 

167 

165-3 

+ 1-7 


11 

146 

145-8 

+ 0-2 


12 

168 

1 165-0 

+ 3-0 


12 

148 

145-5 

+ 2-5 

1929, 

, 1 

2 

167 

165 

1 164-6 

I 164-3 

+ 2-4 
+ 0-7 

1932, 

, 1 
2 

147 

147 

145-2 

144-9 

+ 1-8 
+ 2-1 


3 

4 

166 

162 

164-2 

164-1 

+ 1-8 
- 2-1 

I 

3 

4 

146 

144 

144-6 

144-3 

+ 1-4 
-0-3 


5 

6 

161 

160 

164-1 

164-0 

- 3-1 

- 4-0 

1 

5 

6 

143 

142 

144-1 

143-8 

— 1-1 
- 1-8 


7 

161 

164-0 

- 3-0 


7 

143 




8 

163 

163-9 

- 0-9 


8 

141 




9 

164 

163-6 

+ 0-4 


9 

141 




10 

165 

163-2 

+ 1-8 


10 

143 




11 

167 

i 162-8 

+ 4-2 


11 

143 




12 

167 

1 162-2 

j +4-8 


12 

143 





Cost of Living I 


238 ELEMENTARY STATISTICAL METHODS 

we take as many years’ experience as is available, in order 
to reduce as much as possible the random element left in 
the fluctuations after the trend has been taken away from 
the original series. In the simple illustration shown here 
there is only the experience of four years from which to 
estimate the regular movement, and we should not, in 
practice, expect the results to be very precise. 

We will suppose that the results obtained do actually 
give us the seasonal movem^t, then we can get the 
irregular movement and present the final results of the 
aual5^is as in the table on p. 235. 

This method is applied in exactly the same kind of 
way to a series of monthly figures. We may illustrate 
with the monthly cost of Living Index Numbers of the 
Ministry of Labour. (Table on p. 236.) 


Fluctuations : Calculation op Regular Seasonal Movement 


Month 

1927 

1928 

1929 

1930 

1931 

1932 

Totals 

Regular 

Se^nal 

Movement 

1 


+ 21 

+ 2-4 

+ 4-3 

+ 0-8 

+ 1*8 

+ 11-4 

+ 23 

2 


+ 01 

+ 0-7 

+ 2-8 

+ 0-6 

+ 2-1 

+ 6-3 

+ 1*3 

3 


- 1-9 

+ 1-8 

+ 0-3 

-0-4 

+ 1*4 

+ 1-2 

+ 0-2 

4 


- 1-9 

-2-1 

-30 

- 2-5 

-0-3 

- 9-8 

-20 

5 


- 1-8 

-31 

-4-3 

- IS 

- M 

- 11-8 

-2-4 

6 


- 0-6 

— 4-0 

-4-3 

-2-8 

- 1-8 

- 13-5 

-2*7 

7 

- 1-2 

- 0-5 

- 30 

-2-3 

-0-2 


- 7-2 

- ;*4 

8 

-2-7 

-0-5 

- 0-9 

+ 0-8 

- 1-8 


- 51 

- 10 

9 

- M 

- 05 

+ 0-4 

+ 1-7 

- 1-4 


— 0-9 

- 0 -2 

10 

+ 1-2 

+ 0-S 

+ 1-8 

+ 1-6 

- M 


+ 4-0 

+ 0*8 

11 

+ 3*2 

+ 1-7 

+ 4-2 

+ 3-3 

+ 0-2 


+ 12-6 

+ 2-5 

12 

+ 3-2 

+ 3-0 

+ 4-8 

+ 2-0 

+ 2-5 


+ 15-5 

+ 31 


There is, in the Cost of Living Index Number, a seasonal 
movement which raises and lowers the numb^ by as much 


ANALYSIS OF TIME SERIES 


239 


as 3 points, the maximum increase being in December, and 
the maximum decrease in June. This seasonal movement 
is in the main due to higher prices in winter of certain 
foodstuffs. 

Diagram 38 shows the change in the Cost of Living 
Index Number. 



INDEX 


The numbers uftf U> poifs 


Abstract, Statistical, 37 
Accuracy : of Averages and Ratios, ! 
51-7 ; of Statistical Data, 
10, 11, 24 

Acreage under Crops, 184, 206-7 
Ages : at Death, 38 ; at 

Marriage, 34, 102 
Age-distribution : of Female 

Textile Workers, 101 ; of 
Population, 43 

Area Scale : in graphs, 99 ; when 
grade interv'als change, 102 
Assembly of Statistical Data, 
24 

Average : Estimate of, 84-94 ; 
Limitations of, as repre- 
sentative of a group, 120 ; 
Persons per Room, 82 ; 
Persons per Family, 79 ; 
Rooms per Family, 81 ; 
Wagon Load, 65, 120 ; 

Weighted, 141 ; Weighted, 
Difference between weighted 
and unweighted, 143 
Averages, 42, 48-50 ; Calculation 
of. 77 ; Comparison of, 60 ; 
Utility of, 74 ; Movnng, set 
Moving Averages 

Balance of Trade, 57 
Birth-rate, 64, 182 
Births. Mumber of, per Marriage, 
77 

Board of Trade, 154, 158, 168. 
See also Index Number of 
Wholesale Prices 
Bowlcy. Prof. A. L.. 9, 16 


Calculation of ; Averages, 77 ; 
Mean Deviation, 123 ; 
Quartiles and Quartile 


Deviation, 132 ; Standard 
Deviation, 126 

Census, 14, 18 ; of Population, 8, 
11, 14. 23. 25. 29. 40, 44. 56, 
78, 176 ; of fhoduction, 8, 14 
Chain Base in Index Numbers, 156 
Characteristics of Units in a 
Statistical Inquiry, 22, 25 
Classification, 25-^ 

Coal: Mininglndustry, 33, 37,55, 
60, 70. 75. 91. 100, 119, 180, 
189, 200 ; Production, 177, 
208, 234 

Comparisons, 42 ; of Averages, 60 
Cumulative ; Diagrams, 108 ; 
Tables, 109 

Cychcal Movement in Time Series, 
210 

Data, Statistical, 2-3 ; Precis* 
meaning of, 3-4, S 
Death-rate. 62, 73 ; as simple or 
weighted average, 148 
Definitions, 4, 8, 9 ; and Classifica- 
tion, 28 

Deviation. 122 : Average or Mean, 
122 : Mean, from Average 
or Median, calculation of, 
123 ; Quartile, 131 ; 
Standard, 126 

1 Diagrams, 97, 173 ; Cumulative, 
i 108 : showing grouped data 

I (linear and area scale), 98, 

I 99 

1 Distribution of Unemplojunent 
1 Percentages, 106 

! 

i Estimate of; Average, 84-94; 
! Insured Workers, 57 ; 

Population, 56 

Export Trade, 14, 27, 56, 159 
Foreign Trade, 27, 37, 56, 159 


210 


INDEX 


241 


Geometric Mean, 162-3 
Graphical Methods, 97 
Graphs with linear and area scale, 
98-9 

Grouping Statistical Data. 27 
Histogram, 105 

Imports, 7, 8, 10, 14, 19, 27, 56, 
159, 179, 183 

Index of Production, 151, 154, 
155, 158, 160 

Index Number, 151 ; Cost of 
Living, 153, 157, 167, 168, 
170; Fix^ Base, 155; 
Moving Base, 155; of Prices, 
152, 155 ; of Wholesale 

Prices, Board of Trade, 158, 
164 ; of Wholesale Prices, 
Statist, 155, 158, 161, 164, 
230-3 ; weights, 159 
Industry, Numbers in, 47 
Inquiries, Statistical, Kinds of, 
7, 8 

Insur^ Population, 57, 173, 178 
Interpretation of Ratios, 62 

Labour Gazette, 154 
Linear Scale in Graphs, 98 
Logarithm Tables, 57 
Logarithms used in Calculation 
of Geometric Mean, 162 
Logarithmic Scale, 196 
London and Cambridge Economic 
Service, 139, 155, 158 

Marks in an Examination, 84, 99 
Marriage-rate, 63 
Mean ; same as simple average, see 
average ; Deviation, cal- 
culation of, 123 ; Geometric. 
162 

Measures of dispersion in a 
group, 121, 134 

Median, 116, 117; by Calcula- 
tion, 118; from Cumulative 
Diagram, 118 

Medical Inspections, School, 12 
Methods, Statistical, 24 
Mineral Output, 185 


Mines, Annual Report of Secretary 
for. 33 

Ministry of Labour Gazette, 154 
Moving Averages, 214 ; Effect 
of using the method on a 
series without fluctuations, 
216 ; Effect of using the 
method on a series of irregular 
fluctuations, 222 ; Effect of 
using the method on a series 
of regular fluctuations, 219 ; 
General guidance for using 
the method, 226 

Output: Gross, 161; Net, 160; 
of Coal, 177, 208, 234 ; of 
Minerals, 1 85 ; per Man-shift 
in the Coal Mining Industry. 
70, 149, 189 

Percentages, 48 

Percentage Scale in diagrams of 
Time Series, 192 
Polygon of Frequency, 104 
Population, 43, 44, 176 
Price Index Number, 152, 155, 
158. 161, 164 
Primary Statistics, 24 
Production, Index Number, 151, 
154, 155, 158, 160 
Profits in Coal Mining Industry, 
91 

Quartiles : Lower and Upper, 

131 ; Lower and Upper 
from Cumulative Diagram 
and from Cumulative Table, 

132 

Quartile Deviation, 131 

Railway Statistics, 65 
Range of Variation as measure 
of dispersion in a group, 121 
Rate, Death, 62, 73; as simple 
or weighted average, 148 
Rates, 50 ; Standardized, 67-74 
Ratio Scales, 196 
Ratios : Statistical, 42 ; Utility 
of, 74 

Rectangles used in Graphs, 99 
174 


109: 


zA2 index 


Regular Movement in a Time 
Series, 233 
Round Numbers, 57 
Rowe, J. W. F., 139, 155 

Sample. 14, 157, 159 ; Inquiry, 
16 ; Tests of Random nature 
of, 18, 58 

Scales used in plotting Time Series, 
189, 192, 196 

School Medici Inspections, 12 
Seasonal Movement in Time Series, 
210, 233, 238 

Seconds^ Statistics. 24. 42 
Secular Trend, see Trend 
Sexies, Time, 173 
Shipping. 37 
Significant Figures, 53 
Snow, Dr, E, C., 137 
Sources of Stat^ical Data, 11, 12 
Standard Deviation. 126 
Statistical : Abstract. 37 ; Aspect 
of a problem. 5 ; Data, 2, 3 ; 
Data, precise meaning of. 3, 
4 ; Inquiries, 7 ; Methods, 
24 : Tables, 24 
Statistician’s functions, 1, 2 
Statist, Index Number of Whole- 
saile Prices, 155. 158, 161, 
164, 230. 231, 233 
Sums, Weighted, 136 
Super Tax. Numbers liable to, 
94, 109, 203 

Survey, London Survey of 

London Life and Labour^ 
1928), 9, 20. 137 


Tables : Cumulative, 

Statistical, 24, 32-40 
Tabulation, 24 
Tests of Random Sampling, 18, 
58 

Time Series, 173 ; Trend, 209, 
214 ; Seasonal Movement, 
210, 233 : Cyclical Move- 
ment, 210 ; Inegnlar fluctua- 
tions, 211 

Trade: Balance of , 57 ; Board of. 
154, 158 ; see also Board of 
Trade and Index Number of 
Prices 

Trapezia, used in gmphs, 104 
Trend in Time Series, 209 

Unemployed Person, 8, 9, 20, 57 
Unemployment : Insurance, 7, 

17 ; Percentages, Cnmulative 
table, 110, 112 ; Percentages, 
distribution by districts, 106 
Units in Statistical Inquiry, 22 


Variation in a Group, 120 

Weighted : Averages, 141 ; 

Averages, difierence between 
weighted and unweighted 
average, 143 ; Sums, 136 
Weights in Index Number Cal- 
culations, 159 

Wheat consigned to U.K., 187 
Working Class, 9, 20 


OLLOHO TMOL LlBRftRY 


6209 





